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ABSTRACT 


Linear  and  nonlinear  digital  communication  channels  with 
memory  are  considered,  with  the  goal  of  providing  a number  of 
general  tools  for  the  analysis  and  design  of  coding  and  modula- 
tion schemes  operating  on  such  channels. 
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FOREWORD  - MOTIVATION  AND  SUMMARY  OF  THE  WORK 


: 


The  increasing  demand  for  high-speed  digital  communication  has 
resulted  in  new  theoretical  problems  for  communication  engineers.  In 
particular,  analytical  tools  are  called  for  that  allow  the  performance 
of  digital  transmission  systems  to  be  evaluated  accurately,  and  coding 
or  modulation  schemes  to  be  designed  properly. 

The  channel  model  that  will  be  dealt  with  in  this  Report  is  a 
bandlimited  one  --  possibly  nonlinear.  In  fact,  in  most  digital  com- 
munication systems,  for  an  efficient  urs  of  the  frequency,  only  a 
restricted  bandwidth  is  available.  Moreover,  in  some  cases,  such  as 
satellite  repeaters,  amplifiers  are  working  at  or  near  saturation  for 
better  efficiency,  so  that  the  combined  effects  of  bandlimiting  and  non- 
linearity must  be  taken  into  account  for  a proper  analysis  of  the  system. 

The  purpose  of  this  Report  is  to  gather  a number  of  analytical 
tools  that  have  been  devised  for  designing  or  analyzing  coding  and  modu- 
lation schemes  for  digital  bandlimited  communication  channels,  i.e., 
channels  with  memory. 

Chapter  1 is  devoted  to  the  description  of  some  coding  and  modu- 
lation schemes  devised  for  real-life  channels.  Essentially,  two  philo- 
sophies are  involved  in  those  schemes.  The  first  is  intended  to  restrict 
the  symbol  stream  to  form  only  sequences  that  are  well  matched  to  the 
channel  features.  The  second  is  to  constrain  the  power  spectrum  of  trans- 
mitted signals  in  order  to  concentrate  it  in  spectral  regions  where  the 
channel  response  is  better. 

Design  techniques  based  on  both  philosophies  are  considered  in 
Chapter  2. 


In  Chapter  3,  the  problem  of  characterizing  continuous  nonlinear 
channels  with  memory  is  considered.  A Volterra-series  approach  is  taken 

[ 

and  a method  is  proposed  for  computing  the  error  probability  at  the  out- 
put of  such  channels. 

Finally,  in  Chapter  4,  a Markov -chain  technique  is  proposed  for 
modeling  discrete-time,  nonlinear  channels  with  memory.  This  model  can 
be  used  for  several  applications:  among  them,  I shall  consider  the  com- 
putation of  power  spectra  at  the  channel  output,  the  derivation  of 
maximum-likelihood  sequence  demodulators  and  the  evaluation  of  error 


CHAPTER  1 - SOME  EXAMPLES  OF  COVING  AND  MODULATION  SCHEMES 
FOR  CHANNELS  WITH  MEMORY 


1 • 1 INTRODUCTION 

In  this  chapter,  I shall  describe  some  examples  of  coder -modulator 
pairs  devised,  and  sometimes  actually  used,  for  transmitting  digital  infor- 
mation over  channels  with  memory.  This  is  intended  to  provide  motivation  for 
the  material  of  later  Chapters;  there,  some  general  tools  will  be  developed 
that  will  allow  these  transmission  schemes  to  be  analyzed  for  the  purpose 
of  deriving  general  performance  indices,  such  as  spectral  occupancy  or 
error  probability. 

The  transmission  scheme  with  which  we  shall  be  dealing  is  not  un- 
common in  Information  Theory  (see  Fig.  1.1).  We  will  assume  that  the  infor- 
mation source  generates  a sequence  of  B-ary  symbols  an,  -°o<n<°°  , which  are 
equally  likely  and  statistically  independent  of  each  other. 

The  information  sequence  is  sent  to  a coder,  which  maps  it  into  a 
sequence  of  M-ary  symbols  bn  , and  then  to  a modulator.  We  shall  assume 
that  the  modulator  is  memoryless,  i.e.,  it  performs  a one-to-many  mapping 
of  coded  M-ary  symbols  into  a set  of  waveforms  of  equal  duration  T (say) . 

For  generality,  we  shall  allow  coded  symbols  and  modulator  wave- 
forms to  be  either  real  or  complex.  In  particular,  symbols  and  waveforms 
will  be  assumed  to  be  real  when  the  transmission  channel  is  baseband,  and 
complex  when  the  transmission  is  bandpass.  This  convention  is  based  on  the 
use  of  complex  envelopes  to  represent  bandpass  signals  and  systems. 

The  pioneering  work  of  Nyquist  (1924,  1928)  was  devoted  to  optimiz- 
ing the  modulation  process,  i.e.,  the  choice  of  waveforms  chat  allow  the 


effects  of  the  channel  memory  to  he  coped  with  when  the  channel  is  assisted 
to  be  bandlimited,  linear  and  perfectly  known. 

More  recently,  it  has  been  recognised  that  coding  can  be  used  to 
improve  the  performance  of  a data  conmunicat ion  system.  Codes  can  be 
used  to  provide  protection  against  certain  types  of  errors  that  may  occur 
during  transmission;  however,  most  of  the  results  of  coding  theory  are 
based  on  a memoryless  channel  assumption.  Although  it  has  been  experi- 
mentally recognized  that  some  codes  designed  to  operate  on  a memoryless 
channel  can  improve  the  performance  of  a channel  with  memory,  no  general 
results  are  available  in  the  area  of  coding  for  channels  with  memory. 

Coding  schemes  have  been  developed  for  use  in  pulse-code  modula- 
tion (PCM)  systems,  high-speed  data  communication  systems,  and  high- 

I» 

density  magnetic  recording.  As  we  shall  sec  more  thoroughly  in  Chapters 
1 and  2,  these  schemes  are  designed  to  cope  with  unwanted  features  of  the 
channel.  For  example,  i t the  channel  exhibits  a poor  frequency  response 
in  some  regions,  one  may  want  the  transmitted  signal  to  contain  relatively 
little  power  in  the  imperfect  regions  of  the  channel.  In  this  way,  we 
can  reasonably  expect  that  the  amount  of  distortion  produced  by  the  chan- 
nel will  be  reduced  with  respect  to  the  case  of  no  coding. 

A slightly  different  philosophy  leads  to  the  concept  of  codes  de- 
signed in  the  time  domain.  The  basic  idea  is  to  avoid  transmitting  cer- 
tain symbol  sequences  that  are  not  well  matched  to  the  channel  character- 
istics. For  example,  sequences  giving  rise  to  a constant,  non-zero 
voltage  on  a transmission  channel  must  be  avoided  if  the  channel  has  a 
poor  response  around  the  zero  frequency. 

In  the  next  section,  we  shit  1 1 consider  some  common  examples  of 


binary  (M=2) , ternary  (M*3)  and  multilevel  (M>3)  codes  designed  for 
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transmission  of  digital  data.  For  other  examples  see  Croisier  (1970) 
and  Kobayashi  (1970  and  1971),  as  well  as  the  references  therein. 

1.2  BINARY  CODES 

The  simplest  way  to  transmit  binary  digital  data  on  a line  is  to 
associate  two  different  v'aveforms,  say  s0(t)  and  Si(t)  , to  source 
output  symbols  "0"  and  "1"  . Since  generally  the  medium  to  be  used  for 
transmission  has  a poor  dc  response,  one  wants  to  avoid  transmitting 
a dc  term.  If  source  zeros  and  ones  are  equally  likely,  the  transmitted 
signal  is  dc-free  if 

si(t)  = - so(t)  (1.1) 

A common  signaling  format  for  baseband  signals  uses  positive  and 
negative  rectangular  pulses  with  duration  T , where  T 1 is  the  rate, 
in  binary  symbols/sec  , at  which  the  source  outputs  its  symbols.  In  the 
simplest  format,  called  non-retum-to-zero  (NRZ) , a source  "one"  is 
thus  represented  by  one  voltage  level,  and  a "zero"  by  its  negative. 
Transmission  of  dc  is  thus  avoided,  but  the  power  spectrun  of  this  signal 
format  is  concentrated  around  the  zero  frequency,  a spectral  zone  where 
the  response  of  the  transmission  medium  is  usually  poor. 

A better  distribution  of  signal  energy  can  be  obtained  using  two 
different  waveforms  s0(t)  and  Si(t)  , i.e.,  using  a different  modulation 
scheme.  A possible  choice  is  depicted  in  Fig.  1.2  . With  this 
choice  (Split-phase,  or  Manchester  code,  or  binary  phase  modulation), 
the  power  of  the  transmitted  signal  is  concentrated  toward  higher  frequen- 
cies, where  the  medium  response  is  expected  to  be  better. 


-1  .4- 


A more  sophisticated  solution  can  be  achieved  by  coding  the  source 
output.  An  interesting  example  of  such  a scheme,  known  as  Delay  Modula- 
tion  or  Miller  Code  (Hecht  and  Guida,  1969),  can  be  modeled  as  follows: 
consider  the  4-state  sequential  machine,  driven  by  source  outputs,  de- 
picted in  Fig.  1.3  . 

To  each  of  the  four  states,  a waveform  is  associated  according  to 
the  rule  sumnarized  in  Fig.  1.4  . Every  source  symbol  output  produces  a 
transition  from  one  state  to  another  (Fig.  1.3);  an  example  of  the  wave- 
form at.  the  output  of  the  modulator  is  given  in  Fig.  1.5  . 

As  a result,  with  this  code,  the  signal  power  spectrum  is  highly 
concentrated  around  frequencies  less  than  .5/T  ; moreover,  the  spectrum 
is  small  in  the  vicinity  of  the  zero  frequency.  This  code  is  actually 
used  for  magnetic  tape  recording;  it  is  also  attractive  for  phase-shift- 
keyed  signaling  (see  Lindsey  and  Simon,  1973,  p.  11). 

Binary  block  codes  with  frequency  spectrum  constraints 

Gorog  (1968)  has  analyzed  families  of  binary  block  codes  that  are 
both  redundant  (i.e.,  error -detecting)  and  have  the  signal  energy  concen- 
trated into  a predetermined  range  of  the  frequency  spectrum. 

Consider  a sequence  of  N coded  symbols  b^,b. b^  j , and 

assume  that  each  symbol,  taking  the  value  +1  , amplitude-modulates  a 
basic  waveform  s(t)  with  duration  T . The  signal  at  the  output  of  the 
modulator  is 

N-l 

x(t)  = l b{  s(t-iT)  (1.2) 
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and  its  Fourier  transform  is 


N-l  ..mT 

X(fc>)  = S(oj)  l b.  e'1^1  , j=/T  (1.3) 

i=0  1 

where  S(co)  is  the  Fourier  transform  of  s(t)  . 

As  can  be  seen,  the  code  contributes  to  the  spectrum  through  the 

factor 


» N-l  . . _ 

C(u>)  = l bi  e "1^a)T  (1.4) 

i=0 

We  shall  now  see  how  to  construct  codes  that  give  rise  to  zeros 
occurring  at  w=0  or  u>=tt/T  , which  are  frequency  values  where  channel 
imperfections  usually  occur. 

Consider  first  to=0  , i.e.,  the  coded  sequences  whose  spectrum  is 
zero  at  the  origin.  A code  of  this  kind  is  useful  for  transmission  over 
channels  with  poor  response  at  low  frequencies. 

To  get  C(0)=0  , the  condition 


N-l 

T b.  = 0 
i=0  1 


(1.5) 


must  hold;  i.e.,  the  coded  sequence  must  contain  the  same  number  of  +l's 
as  -l's  . If  N=kv  , the  sequence  can  be  decomposed  into  k subsequences 
of  v symbols  each.  Thus,  a sufficient  condition  for  (1.5)  to  be  satis- 
fied is  that 

jv-i 

l b.  = 0 j-1,2 k (1.6) 

i=(j-l)v  1 
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that  is,  the  j-th  subsequence  must  contain  the  same  number  of  +l's 
and  -l’s  (this  requires,  of  course,  that  v be  even). 

If  each  subsequence  is  a codeword,  the  axle  has  no  more  than 


codewords . 

Another  constraint  that  can  be  imposed  is  that  X(w)  be  zero  at 
u>»VT.  This  can  be  obtained  if  C(u/T)  ■ 0 ; i.e.. 


N-l  . 

I b.(-l)1  * 0 
i-0  1 


(1.7) 


A sufficient  condition  for  (1.7)  to  be  satisfied  is 

j^-1  A 

l a.(-l)1  ■ 0 ,k  (1.8) 

i-(j-l)v  1 

For  v even,  a subsequence  will  satisfy  this  condition  if  a reversal  of 
the  signs  of  its  odd-numbered  bits  produces  a subsapience  with  the  same 
number  of  +l's  as  -l's  . There  are 


different  subsequences  of  length  v that  meet  this  condition. 


] 


Gorog  (1968)  also  gives  solutions  to  the  case  when  v is  odd,  arxl 
when  the  constraint  that  C(0)  = C (tt/T)  = 0 is  imposal.  For  example,  in 


M 


r 


the  case  v*8  . there  are  36  codewords  such  that  the  spectrum  of  the 
sequence  will  have  a zero  at  iufO  and  up-tt/T  . The  resulting  code  is 
represented  in  Table  1.1;  notice  that  the  number  of  codewords  (36)  makes 
this  code  useful  for  transmission  of  alphanumeric  symbols  (10  numbers 
26  letters). 

1 . 3 TERNARY  COPES 

A certain  amount  of  transmission  power  can  he  saved  if,  within  the 
set  of  waveforms  used  by  the  modulator,  one  wants  to  include  a zero  wave- 
form. In  a binary  code,  the  simplest  such  format  is  the  so-called  unipolar 
format;  the  binary  symbols  1 and  0 , left  uncoded,  are  transmitted  as 
presence  and  absence  of  pulses,  respectively. 

With  this  format,  long  sequences  of  l's  can  occur  which  result  in 
dc  wander,  since  the  repeaters  along  the  line  are  not  dc-coupled  to  the 
cable  medium.  Thus,  one  can  use  three  waveforms  in  the  modulator:  the 
zero  waveform  and  two  non -zero  waveforms  with  equal  shape  and  opposite 
polarities. 

One  of  the  simplest  examples  is  the  bipolar  code  (Aaron, 1952)  used 
in  Bell  System's  T1  carrier  PCM  system.  In  bipolar,  the  source  binary 
symbol  0 is  represented  by  no  signal  on  the  line,  and  the  binary  symbol 
1 is  represented  alternately  by  posit ive  and  negative  pulses:  The  effects 
of  dc  wander  are  reduced,  since  a pulse  of  one  polarity  is  certainly 
followed  by  a pulse  of  the  opposite  polarity. 

Codes  with  B=2  and  M=3  are  often  referred  to  as  pseudotemarv  (PT) 
codes.  The  bipolar  format  is  a simple  example  of  pseudoternary  code;  more 
sophisticated  schemes  of  such  codes  will  be  described  later  on  (see  also. 
Croisier (1970) , and  references  therein) . 
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L inear  ternary  codes 

A ternary  code  is  called  linear  time -invariant  if  the  coded  symbols 
can  be  derived  from  the  binary  source  symbols  through  the  linear  relation 

bn  = r(D)an  (D^delay  operator)  (1.9) 

where  T(D)  is  a polynomial  that  defines  the  code,  while  an=  ±1  accord- 
ing to  the  source  output. 

In  order  for  the  bn  to  have  only  three  possible  values  for  any 
sequence  of  source  symbols,  it  is  necessary  that  T(D)  have  only  two 
non-zero  coefficients,  and  that  they  be  either  equal  or  opposite.  We 
have  three  basic  linear  IT  codes: 

IXiobinary:  r(D)  = 1+n 

Twinned  binary:  r(D)  = 1-D 

Class-IV  partial  response:  r(H)  - 1-D2 

IXiobinary  code  is  characterized  by  the  absence  of  direct  transitions  be- 
tween levels  ♦ and  - , but  is  subject  to  unwanted  dc  wander.  Twinned 
binary  and  class-IV  partial -response  code  are  free  of  dc. 

Linear  codes  usually  suffer  a comnon  drawback:  in  either  case,  the 
erroneous  reception  of  one  ternary  element  causes  a possibly  infinite 
error  propagation.  In  fact,  an  can  be  recovered  from  bn  according  to 
the  rule: 


a 


n 


(1.10) 


TABLE  1.1 
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and  error  propagation  is  avoided.  It  can  be  shown  that,  if  (a  )°° 

n n*-« 

is  a sequence  of  I ID  binary  random  variables,  (a'  )*  is  again 

nn=-on  6 

a sequence  of  IIP  binary  random  variables,  so  that  the  precoding 
operation  does  not  alter  source  statistics. 

For  example,  if  F(D)  = 1-D  , the  precoding  operation  is 


a 


n 


a © a'  . 
n n-1 


(1.16) 


where  © denotes  module- 2 addition. 

It  can  be  seen  that  precoded  twinning  binary  is  equivalent  to  the 
bipolar  code. 


Nonlinear  ternary  codes 

We  shall  now  describe  some  of  the  ternary  codes  that  have  a practical 
significance  in  data  communications.  Following  Croisier  (1970),  we  shall 
distinguish  between  alphabetic  and  nonalphabet ic  codes. 

A.  Alphabetic  ternary  codes 

Source  binary  data  are  first  framed  into  blocks,  and  each  is  encoded 
in  a ternary  block.  The  block  length  can  lie  kept  constant,  or  is  a variable. 
For  constant -length  codes,  with  binary-block  length  K and  ternary-block 
length  N , there  are  2^  binary  blocks  and  3N  ternary  blocks;  so  we 
must  have 


2 i 3 


N 


(1.17) 


or 

K v log  l N (logj 3 = 1.58)  (1.18) 


We  see  that  alphabetic  ternary  codes  may  potentially  increase  the  data  rate 
up  to  581  over  uncoded  binary. 


■ 

I 


i 


.i 
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Pair-selected  ternary  (Sipress,1965) 

A widely  known  code  of  this  type  is  Paired  Selected  Ternary  (PST) . 

In  this  code,  the  incoming  binary  sequence  is  framed  into  blocks  of  length 
K=2  , and  each  binary  block  is  coded  according  to  the  scheme  of  Table  1.2  (N=2) 


Binary 

0 

0 

0 

1 

1 

0 

1 

1 

Positive  mode 


Negative  mode 


Table  1.2  - Paired  Selected  Ternary  Code 

The  mode  is  reversed  each  time  a binary  10  or  01  block  is  encountered. 

The  purpose  of  this  code  is  to  eliminate  dc  and  the  ternary  charac- 
ter 00,  simultaneously.  In  fact,  since  timing  information  must  be  extracted 
from  the  pulse  train  be  regenerative  repeaters,  long  sequences  of  zeros  re- 
sult in  long  periods  without  timing  information. 

We  see  that  there  are  two  possible  ternary  characters  encoding  the 
binary  blocks  01  and  10:  the  dc  is  balanced  by  alternating  the  two 
kinds  of  blocks. 

Codes  with  K=4,  N=3 

With  N=3,  33=27  possible  ternary  blocks  are  available.  A code  can 
be  designed  by  choosing  properly  the  correspondence  between  binary  and 
ternary  blocks. 

The  basic  ideas  are: 

a.  Eliminate  the  ternary  character  000 

b.  The  following  ternary  blocks:  0+-  +0-  +-0 


E 


E 


a 


'■  IBM  I . W? 
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are  dc-free  and  can  be  used  without  any  restriction, 
c.  The  remaining  20  have  non- zero  dc:  ten  of  them  have  positive  dc, 
and  ten  negative.  They  can  be  paired  to  represent  ten  different  binary 
blocks;  with  each  binary  block,  one  will  associate  either  one  ternary 
block  or  the  other  so  as  best  to  compensate  the  dc. 

The  first  such  code  is  called  4B-3T  (Jessop,1968) . Here,  the  char- 
acters in  each  pair  are  inverses  of  each  other;  e.g. : 

+ + + and  

+00  and  - 0 0 
etc. 

Another  code,  described  by  Franaszek  (1968),  is  based  on  a more  sophis- 
ticated attribution  of  ternary  blocks.  Frananszek's  MS43  code  is  generated 
under  constant  monitoring  of  the  running  digital  sum  (RDS),  defined  as 

°V  - l bn  (1-19) 

n-u 

where  y is  arbitrary  but  fixed.  When  studying  a code  without  a dc  compo- 
nent, it  is  very  important  to  know  the  maximum  digital  sum  variat (DSY) 
of  this  code.  The  DSV,  defined  as 

DSV  = max  a,  - min  o%1  (1.20) 

v v v v 

is  a parameter  that  measures  roughly  the  distortion  suffered  by  a coded  sig- 
nal passing  through  a channel  that  does  not  transmit  direct  current. 

It  can  be  shown  that  4B-3T  has 


DSV  = 7 
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A more  sophisticated  scheme  can  be  obtained  by  allowing  the  codewords  to 
have  several  different  lengths.  In  Franaszek's  VL43  code  (Franaszck, 

19581 • binary  data  are  framed  in  blocks  of  4 or  8 bits,  each  of  them  being 
coiled  into  a ternary  block  with  3 or  b symbols,  respectively.  For  this 
code,  the  value  of  DSV  is  4. 

R.  Non -alphabetic  ternary'  codes 

Many  non-alphabet ic  ternary  coding  schemes  have  been  proposed  for  use 
in  cable  PtM  (see  Croisier ,1970,  and  references  therein). 

A scheme  which  is  actually  used  is  the  bipolar-with-six-zero-substitu- 
tion  (BbZS)  code  (Davis, 1969) . The  basic  idea  is  the  following;  the  source 

sequence  is  first  encoded  using  bipolar  code,  then  even'  sequence  of  b con- 
secutive zeros  is  substituted  by  a "filling  sequence".  Since  the  filling 

sequences  must  be  recognized  by  the  receiver,  they  must  contain  a bipolar 

violation;  i.e.,  a pulse  whose  polarity  is  the  same  as  that  of  the  last 
nonzero  pulse.  In  RbZS,  the  filling  sequence  is 

ROVBOV 

where  B represents  a normal  bipolar  pulse , and  V a bipolar  violation 
pulse.  As  an  example,  the  sequence  • ••1010000000110* ••  can  be  coded  as 
• • •+Q-+0+-0-0»-0« ••  , where  the  filling  sequence  has  ln?en  under 1 i ned . 

C.  Multilevel  Codes  (M>3) 

In  digital  microwave  transmission  systems,  using  either  terrestrial 
radio  links  or  satellite  channels,  the  need  of  modulation  schemes  that 
employ  efficiently  bandwidth  and  power  has  lev!  to  the  extensive  use  of 
phase-shift  keying  (PSK).  In  our  framework,  wr  can  represent  M-ary  PSK 
(M=Zm)  in  this  way:  the  binary  source  sequence  is  first  framed  into  blocks 
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of  m» log 2l\  symbols.  To  each  block,  the  coder  associates  the  complex  number, 

or  "phase", 

ej£(2l,/M)  fc-0,l,...,M-l 

The  complex  envelope  of  modulator  output  is  then 

plWM)  s(t) 

where 

11  |t|<T/2 

s(t)  - » 

'0  elsewhere  . 

PSK  works  very  well  under  not -too -severe  conditions.  However,  since 
the  power  spectrum  of  a PSK  signal  exhibits  sidelobes  that  can  interfere 
with  neighboring  channels,  a certain  amount  of  filtering  is  necessary  after 
the  modulator.  This  filtering  results  in  a great  amount  of  envelope  varia- 
tions in  the  signal,  that  pose  considerable  problems  due  to  the  nonlinear 
elements  usually  present  in  a communication  system.  In  fact,  limiters,  up- 
converters,  traveling -wave  tube  amplifiers,  etc. , operated  at  or  near  satu- 
ration for  better  efficiency,  have  AM-PM  conversion  effects  which  may 
seriously  impair  system  performance. 

An  important  fact  is  that,  as  I shall  show  later,  the  degrading  effects 
of  nonlinear  devices  such  as  those  encountered  in  practical  systems  are  only 
envelope -dependent.  Thus,  modified  PSK  schemes  which  exhibit  nearly  con- 
stant envelope  after  filtering  are  called  for. 

It  has  been  experienced  that,  in  conventional  PSK,  the  most  critical 
situation,  with  respect  to  envelope  fluctuations , occurs  when  the  phase  of 
the  transmitted  signal  incurs  a 180°  change.  In  correspondence  to  such  a 
transition,  the  signal  envelope  ilsually  drops  to  very  small  values  (possibly 
zero ) . 


M 
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To  avoid  this  unwanted  effect,  several  schemes  have  been  devised. 

The  basic  idea  underlying  all  such  schemes  is  simple:  to  avoid  transitions 
between  signal  points  that  are  symmetric  with  respect  to  the  origin  of 
signal  space.  Let  us  see  how  modulation  and  coding  schemes  can  be  designed 
in  order  to  meet  this  requirement  (see  also  Ajmone  and  Biglieri ,1977) . 

4-phase  offset  PSK  (See  Gronemeyer  and  McBride, 1976,  and  references  therein) 

Observing  Fig.  1.6,  we  see  that,  for  a phase  transition  of  180°  to  take 
place,  it  is  necessary  that  both  in-phase  and  quadrature  parts  of  the  signal 
change  sign  simultaneously.  To  avoid  this,  we  can  modify  a PSK  signal  by 
staggering  its  in-phase  and  quadrature  parts  according  to  the  scheme  of 
Fig.  1.6  . 


BINARY  DATA 

a2 

a 3 

a* 

as. 

86 

_2l_i 

a» 



a» 

i- 

(IN  PHASE 

PSK  < 

(quadrature 

± 1 

± 1 

± 1 

± l 

± 1 

± 1 

± 1 

± i 

« — T — ► 

0FFSEl(IN  ““ 

± 1 



t 1 

± 1 

± i 

PSK  (quadrature 

± 1 

± 1 

t 1 

± i 

_ 

Fig.  1.6 

This  is  actually  equivalent  to  the  following  (uncoded)  modulation  scheme: 


binary 

source 

pair 

modulator 

waveform 

00 

s(t)  ♦ js(t-T/2) 

01 

s(t)  - js (t-T/2) 

10 

-s(t)  ♦ js(t-T/2) 

11 

-s(t)  - js(t-T/2) 
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I 

where  s(t)=l  for  |t|<T/2  and  0 elsewhere. 

Similar  results  can  be  obtained  by  coding  the  source  sequence;  an 
example  is  described  below. 

Tamm-Danilov  codes  (Tamm  and  Danilov. 19681 

The  basic  idea  with  these  codes  is  to  use  only  a subset  of  the  avail- 
able symbols  for  coding;  which  subset  is  actually  used  depends  on  the  source 
symbol.  As  an  example,  assume  that  six  coded  symbols  are  available,  namely: 

ej*(2Tr/6)  £=0,1 ,5 

and  the  coder  operates  according  to  the  following  (differential)  rule: 


1 

binary 

— — 1 

j 1 

source 

coded  j 

pair  an 

. J 

symbol  bn  1 

j 

00 

1 

! 

Vi  , 

01 

b ,e^J 
n-1 

10 

b 

n-1 

11 

b e*jlT/3  1 

n-le  ! ' 

i 

It  can  be  easily  seen  from  the  coding  rule  that  180°  transitions  cannot 
take  place. 
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CHAPTER  2 - THE  DESIGN  OF  CODING  AND  MODULATION  SCHEMES 


2.1  TINE-DOMAIN  AND  FREQUENCY -DOMAIN  CONSTRAINTS 

As  we  have  seen  through  several  examples  presented  in  Chapter  1,  two 
different  philosophies  are  involved  in  the  design  of  coding  and  modulation 
schemes  for  channels  with  memory. 

If  we  look  at  the  discrete  channel  formed  by  modulator,  transmission 
medium  and  demodulator,  the  goal  of  the  coding  scheme  will  be  to  restrict 
the  symbol  stream  to  form  only  "good"  sequences;  i.e.,  sequences  that  are 
well  matched  to  the  channel  structure. 

If  we  look  instead  at  the  continuous  channel,  or  transmission  medium, 
the  goal  will  be  to  create  the  best  possible  discrete  channel.  This  is  ob- 
tained by  constraining  the  transmitted  signal  to  match  the  medium  character- 
istics. A typical  approach  is  to  concentrate  the  transmitted  signal  pcwer 
into  spectral  regions  where  the  channel  behavior  is  better. 

In  this  chapter,  we  shall  see  some  general  techniques  useful  for  the 
analysis  and  the  design  of  coding  and  modulation  schemes. 

2.2  TIME -DOMAIN  CONSTRAINTS 

The  approach  taken  is  based  on  the  use  of  finite-state  machines  (see, 
e.g.,  Gill, 1962)  as  models  of  the  contraints  we  want  to  put  on  the  coded 
sequence  (as  an  example , we  may  want  to  bound  the  DSV,  or  the  number  of 
consecutive  zeros). 

Consider,  for  instance,  the  constraint  that  no  more  than  two  0's  may 
be  transmitted  successively  in  a binary  code.  We  can  represent  this  con- 
straint using  the  three-state  machine  whose  state-transition  diagram  is  de- 
picted in  Fig.  2.1  . 
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0 0 


1 

Fig.  2.1 


To  each  source  symbol,  a transition  between  states  is  associated 
which  accounts  for  the  constraints  we  put  on  the  coded  sequence.  The 
mechanism  of  these  transitions  is  described  by  a finite-state 
machine. 

We  can  represent  an  N-state  machine  by  giving  its  "skeleton  matrix" 
D=(<iij)  ; i.e.,  an  N*N  matrix  whose  entries  are  given  by 


if  transition  from  state  s.  to  state  s. 
is  allowed  1 -1 

elsewhere 


(2.1) 


and  the  NxN  "symbol -transition  matrix"  r = (y  — ) , whose  entries  are 


Yu  = 


symbol  associated  with  transition  s .-*■  s. 

H ^.1  1 5 


K 


if  d. .»0 
ij 


(2.2) 


In  our  previous  example,  we  have 


P 


1 1 0 
1 0 1 
10  0 


(2.3) 


Clearly,  D is  completely  determined  by  r , so  we  can  say  that  r 
characterizes  the  constraints  put  on  the  coded  sequence.  Once  these  con- 
straints are  given,  we  may  ask:  how  much  information  can  be  carried  by 
this  sequence?  More  precisely,  we  may  want  to  compute  the  number  of  bits 
per  symbol  carried  by  the  sequence  (clearly,  the  more  severe  the  constraints, 
the  less  information  will  be  carried  by  each  symbol). 

Define 

m^  = the  number  of  allowable  sequences  of  length  v; 

then  the  maximum  rate  of  the  channel,  i.e.,  the  maximum  number  of  bits 
that  can  be  carried  by  each  coded  symbol  is  (Shannon, 1948) 


[log2  \(\>)-2log2  (N+l)]  < C s i-  log2  A(v)  v=l,2,...  (2.7) 


The  bounds  given  by  (2.7)  become  tighter  as  v increases,  so  that  C 
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can  be  approximated  within  any  desired  accuracy. 

If  R denotes  the  information  rate  of  a code  designed  to  transmit  on 
a channel  with  maximum  rate  C , the  efficiency  of  the  code  can  be  defined 
as 


n = 


R 

c 


(2.8) 


.As  an  example,  upper  bounds  on  the  efficiency  of  codes  designed  by  putting 
limits  on  the  DSV  (see  1.3)  can  be  found  in  Chien  (1970). 

Consider  now  the  sequences  that  can  be  produced  by  a finite-state 
machine  characterized  by  a symbol-transition  matrix  r . The  set  of  se- 
quences of  length  v can  be  obtained  (Franaszek,1970)  by  taking  the  v-th 
power  of  the  matrix  T , where  ordinary  sums  and  products  are  substituted 
by  the  operations  of  disjunction  (V)  and  concatenation,  respectively. 
Concatenation  of  the  null  symbol  $ with  any  symbol  results  in  f . 

As  an  example,  consider  v=2  and  F given  by  (2.4).  The  possible 
triplets  of  symbols  are  given  by 


r3 


nivoiivioivooi 

nivoiivioi 

lllvlOl 


The  set  of  possible  v-tuples 
tion  of  all  the  entries  of  TV  . 


novoio 

100 

IIOvOIO 

100 

(2.10) 

no 

100 

is  then  obtained  by  taking  the 

disiunc- 

In  our  examples,  the  allowable  triplets 


are  all  the  possible  triplets  of  binary  symbols,  except  000  . 
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Design  of  alphabetic  codes 


Suppose  the  source  sequence  is  partitioned  into  blocks  of  length  N, 
each  one  being  mapped  by  the  coder  into  a block  of  length  N'  . 

The  choice  of  the  codeword  used  to  represent  a given  source  block 
may  be  a function  of  the  state  to  which  the  finite-state  machine  represen- 
ting the  coder  is  taken.  In  this  case,  the  code  is  said  to  be  state-depen- 
dent. Or  the  code  may  be  state- independent , which  means  that  codewords 
can  be  freely  concatenated  without  violating  the  constraints. 

Consider  a simple  example;  each  one  of  these  codewords: 


10  0 
0 10 
0 0 1 
1 1 0 
1 0 1 
Oil 
111 


(2.11) 


satisfies  the  constraints  of  Fig.  2.1  (no  more  than  two  0's  may  be  trans- 
mitted successively) ; but  a concatenation  of  the  first  and  second  code- 
word, for  example,  would  violate  them.  To  get  a set  of  codewords  that  does 
not  violate  the  constraint,  we  must  reduce  the  set  (2.11)  to  5 elements: 


1 1 1 
1 0 1 
0 1 1 
110 
0 1 0 


(2.12) 


A method  for  designing  an  (N,K)  alphabet  code  has  been  proposed  by 
Franaszek  (1968,1969,1970);  see  also  Freiman  and  Wyner  (1964).  First, 
for  N and  K fixed,  a recursive  search  technique  is  used  (Franaszek, 


1969)  to  determine  the  existence  of  a set  of 


states.  These  are 


I 


I 
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K 

states  from  each  of  which  there  exists  a number  B^B  of  paths  terminating 
at  other  principal  states.  The  existence  of  a set  of  principal  states  is 
a necessary  and  sufficient  condition  for  the  existence  of  an  (N,K)  alpha- 
betic code.  The  words  available  for  encoding  are  the  paths  of  length  N 
connecting  the  principal  states;  these  words  can  be  obtained  from  the  N-th 
power  of  the  matrix  r . An  example  will  illustrate  this  procedure. 

Example  (Franaszek, 19701 

Consider  a binary  code  for  a binary  source  (B«m=2),  and 
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0 

1 
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0 

0 

1 

i 

p 

p 

0 

1 

0 
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p 

(see  Fig.  2.2).  For  these  constraints,  we  have  C-.552  . 


Let  K*1,N«2  , and  look  for  the  existence  of  a set  of  principal  states. 

N 2 

To  do  this,  compute  first  D * D : 


l 

1 


1 


Dz  = 


10  10 
110  1 
110  0 
0 10  0 


It  is  easily  seen  that  (Si,S2,S3}  is  a set  of  principal  states, 

K 

from  each  of  which  there  are  2=B  paths  of  length  2 terminating  at 
other  principal  states.  So  a (2,1)  code  exists,  with  an  efficiency  of 
approximately  90%  . 

Words  of  length  2 can  be  obtained  from 


01 

0 

00 

0 

01 

10 

0 

00 

01 

10 

0 

0 

0 

10 

0 

0 

The  words  available  for  encoding  are  the  word  sets  associated  with 
the  principal  states: 

Si—  01,  00 

S2  — 01,  10 

s3—  01,  10 

and  a code  can  be  constructed  according  to  the  following  rule: 


binary 

code- 

symbol 

state 

word 

Si 

01 

1 

s2 

01 

S3 

01 

Si 

00 

0 

s2 

10 

S3 

10 
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It  must  be  noted  that,  in  general,  there  are  several  degrees  of  freedom  in 
the  choice  of  codewords  to  be  associated  to  source  symbols.  This  freedom 
may  possibly  be  exploited  to  meet  some  other  requirement  (such  as  the  min; 
mization  of  error  probability) . 
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2 . 3 A (UiNBRAL  TliCl INIQU1-  FOR  QOMUrriNT.  THH  SPECTRUM  OF  A DIGITAL  SKINAl . 

In  this  analysis,  I fchall  assume  the  following  model  for  the  genera- 
tion of  digital  signals  to  he  sent  through  the  transmission  channel:  every 
T seconds,  the  modulator  emits  a waveform  chosen  in  the  set 

{qltjlOK1  (2.13) 

k=l 

where  q(t;k),  lsksp  , are  complex  functions  of  time.  Thus , the  signal 
at  the  output  of  the  modulator  is 

CO 

X(t)  = [ q(t-nT;f,n  ) (2.14) 

n*-op 

CO 

where  (£  )nBOO  is  a wide -sense  stationary  sequence  of  random  variables 
taking  values  in  the  set  {1,2,...,M}  . 

With  these  assumptions,  let  us  compute  the  power  spectrum  of  the 
random  process  x(t).  As  shown  in  Appendix  A,  we  need  first  to  compute 
the  Fourier  transform  of  x(t)  , which  is  given  by 

(u>)  - l Q(u>;^)e'-1nwT  (2.15) 

n=  -°° 

where  Q(w;i),  IsisM,  are  the  Fourier  transforms  of  q(t;i),  IsisM  . 

The  function  r(u)i,io2)  can  be  computed  according  to  (A. 4),  which 

gives 


T(u>i  ,w2) 


l 

n=-oo 


m-  -«> 


• .i  (nun  -mu)2  )T 


(2.16) 


We  are  assuming  that  the  sequence  f^)  w is  wide -sense  station 
ary;  thus,  the  average  in  (2.16)  will  depend  only  on  the  difference  n-m  . 
Defining 

■*(coi  ,<A>2;n-m)  ■ E Q(u>j ;£n)Q*(w2 (2.17 
we  can  rewrite  (2.16)  as 


r(“i,w2)  - l l .jP(u,l,u)J;n-m)e'j(n‘m)u,>T  e‘jm("rU>a)T  (2.18' 


n=-oo  m*-® 


With  the  change  «,»n-m  in  simulation  indices,  and  using  the  equality 


r ='jkxy 

k*-® 


2tt 

x 


CD 


6(y-k^) 


Eq.(2.18)  takes  the  form 


(2.191 


oo  ao  V 

r(wi  ,i»>2 ) * -y  £ / 1 .JtL) i m 2 * t j^>(W,^-4)  (2.20) 

m=  ' t*-<» 

Comparing  (2.20)  to  (A. 3),  we  see  that  the  function  T (u»»  ,o»2 ) has 
only  linear  masses  lying  parallel  to  the  bisector  of  the  plane  (wi  ,<»'?) 
(actually,  the  process  x(t)  is  cyclostationary).  The  power  spectrum  of 
x(t)  is  thus  given  by 


*(w)  - f l p(o.;t)e'j^r  (2.21) 

where 


A 

p(w;*0  - .jf(u),w;t)  . 


(2.22) 


1 
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Notice  that  the  series  (2.21)  may  not  converge  in  the  usual  sense. 
In  fact,  we  may  extract  the  so-called  discrete  part  of  the  spectrum,  de- 
fining 


p(u);°°)  = lim  J3#(u)  = |E{Q(u>;£  )}  | 

£•*» 


(2.23) 


(Of  course,  we  must  make  the  assumption  that  the  limit  exists;  this  condi- 

oo 

tion,  as  we  shall  see  soon,  is  met  if  the  sequence  forms  a regular 

Markov  chain.)  Then,  if  #c(w)  and  «d(u)  represent  the  continuous  and 
the  discrete  part  (line  spectrum)  of  tf(u>),  respectively,  we  have,  using 
(2.19)  once  more: 


I 


0(u))  = #c(u))  + «d(u>) 

OO  . 

tfc(u>)  = h l [p(u>;£)  - p(o);~)]e 
i=-°° 

oo 

#d(w)  = p(w;°°)  l 6(w-d-^- ) 

-°o 


(2.24) 


where 

p(co;A)  = K{Q(a);Ca+k)Q*(w;^k)} 

It  can  be  noted  that  the  line  spectrum  (which  is  often  an  unwanted  feature) 
appears  only  if  E(Q(w;5n)}  + 0 . 

Example  (linear  modulation) 

Assume  that  the  transmitted  signal  is  of  the  form 


x(t)  * l cn  q(t-nT) 

n=  -<» 


where  E(c  }=  0 
n 


(2.25) 
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Then 

P(w;»)  - E(C|l+k  cJ}|Q(u))|2  = pJQMI2  (2.26) 

OO  (jo 

where  *s  t*ie  covariance  sequence  of  the  process  (c  )n^ _oo  , 

and  Q(u>)  is  the  Fourier  transform  of  the  pulse  q(t)  ; hence 


oo 

»(o>)  - »c(u)  = £ |Q(<*>)  | 2 l p e'^T  (2.27) 

i-  -oo 

It  can  be  seen  from  (2.27)  that  the  spectnm  expression  is  made  up 
of  two  terms.  The  first  one,  |Q(u>)  | 2 , depends  only  on  the  shape  of  the 
basic  pulse  used  for  modulation.  The  other  term,  an  infinite  suranation, 
depends  only  on  the  correlation  between  coded  symbols.  Under  the  assumption 
that  source  symbols  are  independent  and  identically  distributed,  this  term 
accounts  for  the  code  only. 

2.4  SOME  EXAMPLES  OF  APPLICATION 

The  computation  of  the  covariance  may  not  be  an  easy  task;  so  Eq.(2.27), 
and  more  generally  Eq.(2.24),  may  be  difficult  to  apply  in  the  actual  compu- 
tation of  power  spectra.  We  shall  see  later  how,  with  some  additional  re- 

OO 

quirements  on  the  sequence  (£^)  „ , a computationally  practical  technique 

can  be  devised.  In  this  section,  we  shall  restrict  our  attention  to  some 
special  cases  in  which  (2.24)  can  be  applied. 

Consider  first  the  important  special  case  that  arises  when  the  ran- 
dom variables  are  independent  and  identically  distributed.  This  situa- 
tion occurs,  with  our  assumptions,  when  no  coding  is  performed  on  the  source 
sequence.  We  shall  now  see,  through  an  example,  hew  a modulation  scheme 
can  be  designed  on  the  basis  of  frequency  constraints. 
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Cons  ider  digital  transmission  over  a satellite  channel.  A sensible 
criterion  in  the  choice  of  the  modulation  scheme  turns  out  to  be  its  spec- 
trum occupancy.  In  fact,  for  most  planned  or  implemented  satellite  links, 
data  streams  from  users  are  assigned  adjacent  frequency  bands,  or  "chan- 
nels", that  interfere  with  each  other  to  a larger  or  lesser  extent  depending 
on  the  bandwidth  occupancy  of  the  modulated  signals. 

For  PSK  modulation,  and  in  general  for  radio  systems,  bandwidth 
occupancy  can  be  defined  as  the  frequency  interval  that  contains  a specified 
fraction  of  the  total  radio-frequency  power.  Since  bandwidth  is  a measure 
of  adjacent  channel  interference,  it  should  be  kept  to  a minimum. 

Consider  first  conventional  quaternary  (i.e.,  M=4)  PSK.  It  fits 
model  (2.14)  provided  that  we  define 

q(t;5k)  - e k s(t)  (2.28) 

where 

fl  |t  |<T/2 

s(t)  =\  (2.29) 

[0  elsewhere 

and 

\ - (2£k-l)ir/4  (2.30) 

If  the  random  variables  are  independent,  identically  distributed  and 
take  values  in  (1,2, 3, 4}  with  equal  probabilities,  we  get  from  (2.27): 

«(a>)  = T ( si-^2  ) (2.31) 


Consider  now  offset  PSK  , as  defined  in  §1.3  . 
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I 


We  can  write 

qCt ;Ck}  - S(t)  ♦ jek  s(t-T/2)  (2.32) 

where  s(t)  is  again  given  by  (2.23),  and  are  obtained  from 

according  to  the  following  rule: 


t 

ek 

It 

ek 

1 

1//2 

1 N1 

2 

-1  l/l 

1//I 

3 

- \//2 

-1//I 

4 

i m 

-1//I 

It  turns  out  that  , Cj.  are  independent,  identically  distributed  random 
variables. 

Taking  the  Fourier  transform  of  q(t;f,^)  , we  get 


Q(*;{k)  » Tte’  * 


and,  consequently, 


p(oj; «.)  * H{Q(co;^e+k)Q*(u);£;jc)} 

y o i + o 

so  that  offset  PSK  is  spectrally  equivalent  to  PSK 


(2.33) 


(2.34) 


fl 


Consider  now  a more  general  scheme  of  offset  PSK,  that  I shall  call 
Shaped  Offset  PSK.  Assume  that 

q(*;£k)  = ek  fCt)  + j ek  f (t-T/2)  (2.35) 

where  f(t)  is  a pulse  shape  of  T seconds  duration,  such  that  the  signal 
x(t)  (see  (2.14))  has  constant  envelope.  (This  is  a feature  of  PSK  sig- 
nals that  we  may  want  to  keep.)  If  , e”  are  obtained  from  ^ accord- 
ing to  the  same  rule  prescribed  for  offset  PSK,  we  can  easily  get,  for  the 
spectrum  of  shaped  offset  PSK: 

•*W=^|FW|!  (2.36) 

where  F(w)  is  the  Fourier  transform  of  f(t) . 

We  would  like  to  find  f(t),  defined  in  the  interval  (-T/2,T/2) 
and  giving  a constant  envelope  x(t)  , such  that  the  power  of  x(t)  is  a 
maximum  in  a given  frequency  band. 

Without  the  constant -envelope  restriction,  this  problem  has  a classi- 
cal solution  offered  by  truncated  versions  of  the  prolate  spheroidal  wave 
functions  (see  Landau  and  Poliak,  1961).  With  this  further  constraint,  the 
problem  is  unsolved,  so  we  must  restrict  ourselves  to  a less  ambitious  goal. 
Consider  the  choice 

f(t)  = ooe[^  t + <t>(t)]  , 1 1 1 <T/2  (2.37) 

which  leads  to  a constant  envelope  x(t)  if  4,  (t)  is  periodic  with  period 
T/2  . Choosing  4>(t)  in  order  to  get  a sharp  roll-off  of  »(w)  as  w-*- ®, 
we  may  thus  expect  that  more  power  is  concentrated  in  the  neighborhood  of 
the  origin. 
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Observe  that,  if  f(t)  is  L-times  differentiable,  integrating  by 
parts  we  get,  as 


F(u>) 


• I (-1)'  - f(t\-T/2)>j“T/Z  , -L-2 

«-0  (jco)*  1 


(2.38) 


i.e.,  the  behavior  ot  I(u>)  , and  hence  |F(w)|2  , as  u>  grows  to  infinity, 
depends  on  the  values  of  the  derivatives  of  f(t)  at  the  edges  of  the  in- 
terval (-T/2.T/2)  . In  particular,  if  we  put 


f(l)(T/2)  - fm(-T/2)  - 0 , Os  e si. 


(2.39) 


then 


I F (u>)  | 1 = 0(u)"2L'4) 


w -*  <** 


(2.40) 


Hxpand  $(t)  in  a Fourier  series: 

<Ht)  - a i cos  ^ t ♦ a?  oos  ~ t ♦ ... 
♦ b,  8in  -£■  t + b2  8in  t ♦ ... 


(2.41) 


We  may  choose  the  Fourier  coefficients  in  order  to  achieve  a prescribed 
nunher  of  zero  derivatives,  i.e.,  a prescribed  asymptotic  behavior  of  the 
spectrum.  If  all  coefficients  are  zero,  i.e., 


<J»(t)  = 0 


(2.42) 


we  get 


Vf(u>)  - 0(u>'4) 


(2.43) 
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and  this  modulation  scheme  is  known  as  MSK  . Using  (2.36),  the  MSK  spectruii 
can  be  easily  computed  (see  also  Simon,  1976): 


. 4it2  / £700  ( 

- T- 


o)T/2  \ 


(2.44) 


Taking  ai=a2=. . .=b2=b3=. . .=0  and  bj=  - j 


we  obtain 


V?(w)  = 0(ui  6) 


(2.45) 


;md  a modulation  scheme  known  as  SFSK  (Amoroso,  1976).  Other  schemes  are 
presented  by  Ajmone  and  Biglieri  (1977)  where  a comparison  between  the  actual 
power  spectia  of  PSK  , MSK  , SFSK  and  other  schemes  is  reported. 

As  another  application,  consider  a linear  modulator  (see  (2.25)) 

oo 

driven  by  a linearly  coded  sequence  (c  ) ; this  sequence  is  obtained 

ii  n**  “oo 

oo 

from  the  source  sequence  (an) _oo  according  to  the  rule 

cn  = r(D)an  (2.46) 

where  D is  the  delay  operator,  and  r(*)  is  a polynomial. 

If  (an)  is  a sequence  of  zero-mean  independent,  identically  distri- 
buted random  variables,  with  a proper  scaling  of  their  amplitudes  we  can 
assume  that  its  correlation  sequence  is  equal  to  (.. .0,0,1, 0,0, .. .)  . In 
this  case,  the  correlation  sequence  of  (cn)  has  a D- transform  given  by 


l Pf  D*  - r(D‘V(D) 

£»-<*>  *- 

so  that , using  (2.27) : 


(2.47) 


»(u>)  = .|r|Q(u))|2  r(e-’“T)r(e-j“T) 


(2.48) 
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I 


The  term  r(e^ <*®')  r(e*j  h®')  accounts  for  the  spectrun  shaping  achieved 
by  coding.  Appropriate  choice  of  polynomial  r(0)  will  lead  to  the  desired 
spectrun  shaping. 

For  example,  the  condition  r(l)=0  will  lead  to  a zero  at  the  origin. 
The  polynomial  r(D)=l-D  (giving  rise  to  twinned  binary  code,  see  §1.3) 
satisfies  this  condition. 

Similarly,  the  spectrun  has  a zero  at  the  origin  as  well  as  at  u)=n/T 
if  r(l)*r(-l)  = 0 . This  condition  is  met  by  the  polynomial  r(D)  = 1-DJ 
(class- IV  partial  response). 


/ 


: 


1 


; 
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2.5  DESIGN  OF  CODES  IN  THE  FREQUENCY  DOMAIN  FOR  LINEAR  MODULATION 


Consider  the  situation  represented  in  the  example  of  §2.3  (linear 
modulation) . Assume  that  the  sequence  (cn)n=-oo  *s  a real  sequence;  so 
we  obtain,  if  the  code  is  linear. 


*0*0  = f |Q0*>) |2  IrCe^1)!2  (2.49) 

We  want  to  find  the  coefficients  of  T(*)  to  yield  a prescribed  spectral 
shaping 

*(o>)  = | Q(uj)  | 2 |oq  +2  I cos  £o)t| 

^=1 

= f |Q(U)|2  C(u)  (2.50) 


The  trigonometric  polynomial  C(u))  of  (2.50)  must  be  non-negative.  Other- 
wise, we  would  get  negative  values  for  *(w)  , the  power  spectrum  of  the 
coded  sequence.  As  a consequence  of  a discrete  version  of  the  Bochner 
theorem  (the  Herglotz  lenma:  see,  e.g.,  LoSve,  1963),  since  C(u>)  is  a non- 
negative trigonometric  polynomial,  from  its  coefficients  (o^)^_0  a finite 

00 

length  autocorrelation  sequence  (p^)^  „ can  be  constructed  according  to 
the  rule 


£=0,1, . . . ,v 
£=-l , . . . ,-v 
otherwise 


(2.51) 


' 


I 


(see  Gilchrist  and  Thomas,  1976). 

Thus,  a polynomial  !*(•)  such  that 
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|r(ejfcfl‘)|*  = C(u) 


(2.52) 


can  be  found.  In  fact,  due  to  a theorem  on  trigonometric  polynomials 
(Szego,1975) , a polynomial  r(*)  of  degree  v can  always  be  constructed 
such  that  (2.52)  holds.  This  polynomial  is  generally  not  unique. 

Unfortunately,  the  coefficients  of  the  coding  polynomial  r( •) 
need  not  be  integer  numbers,  so  that  if  the  source  alphabet  lias  B symbols, 
the  coded  sequence  will  take  up  to  BV+*  different  values,  which  would  make 
the  code  impractical.  Moreover,  the  problem  of  characterizing  covariance 
sequences  of  processes  taking  only  a finite  number  of  values  has  no  solu- 
tion comparable  in  simplicity  to  that  provided  by  the  Herglotz  lemma  when 
the  restriction  on  allowed  values  is  removed. 

As  an  example,  consider  the  class  4/  of  covariance  sequences  of 
processes  taking  only  the  values  ±1  . 

Let  V,  denote  the  set  of  all  kxk  matrices  A=(a.)  with  the 
K ~ 1 J 

property  that 

' k k 

l l x.  a x > 0 (2.53) 

i=l  j=l  1 1J  J 


where  x^,...,x^  e {+1,-1 } . Then  a sequence  (p  )*__tti  belongs  to  4/  if 
and  only  if  (McMillan, 1955) : 


(i) 

(ii) 


p„  , -«»<n  <«>  , are  real  numbers  such  that  p =p  and  p =1 

n n -n  o 

for  any  integer  k > 0 , any  set  {nj ,n2 , . . . ,njJ  of  indices  and 
matrix  A e , the  following  holds 

k k 


l 

i=l 


any 


s 0 . 
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The  design  problem  in  the  frequence  domain  can  be  further  general- 
ized if  the  assumption  of  linearity  of  the  code  is  relaxed,  while  the 
hypothesis  of  linear  modulation  is  maintained.  Before  stating  the  prob- 
lem in  its  generality,  let  us  look  at  a simple  example,  which  will  provide 
some  motivation  for  the  formulation. 

Consider  once  again  the  twinned  binary  code.  With  this  ternary  code, 
we  get  a zero  in  the  spectrum  at  the  origin  but  a rate  of  only  one  informa- 
tion bit  per  coded  symbol,  whereas  a rate  of  log{b  s 1.58  bits/symbol 
could  be  achieved  by  a ternary  sequence  of  independent  random  variables. 
Thus,  a twinned  binary  code  achieves  a desirable  feature  in  the  spectrum 
at  a cost  of  about  a 1/3  decrease  in  information  rate. 

So,  it  is  reasonable  to  raise  the  question  of  whether  any  scheme 
with  the  same  spectrum  as  twinned  binary  could  attain  a greater  rate. 
Moreover , what  is  the  greater  rate  so  achievable? 

Stated  in  its  most  general  terms,  the  problem  is  the  following: 

Consider  a stationary,  time  discrete  random  process  whose  variables 

(c  )°°  take  their  value  from  a finite  set  of  real  numbers.  Let 
n n=-°° 

00 

the  process  have  mean  zero  and  a given  correlation  sequence  (p  )n__00  • 
What  is  the  largest  entropy  that  this  process  can  have,  and  what  is 
its  probability  structure? 

Such  a problem  has  been  considered  by  Slepian  (1972) , and  left 
essentially  unsolved. 


M 


2.6  COMPUTATION  OF  THE  POWER  SPECTRUM  WHEN  THH  SEQUENCE  (£„) 

FORMS  A MARKOV  CHAIN 

So  far,  in  the  derivation  of  the  power  spectrum  I have  made  only 
the  assumption  that  the  sequence  (^n)”=.00  is  wide-sense  stationary.  A 
further  specialization  of  eqs.(2.24)  can  be  obtained  through  the  assump- 
tion that  such  a sequence  forms  a regular,  homogeneous  Markov  chain. 

(Notice  that,  to  meet  this  requirement,  it  is  sometimes  necessary  to 
assume  that  in  the  set  (2.13)  some  of  the  waveforms  are  equal.) 

This  further  assumption  on  the  statistics  of  the  sequence  (£r) 
allows  a powerful  computational  algorithm  to  be  obtained  for  the  deriva- 
tion of  the  power  spectrum  of  a modulated  signal. 

To  justify  the  assumption  that  the  sequence  (£n)  can  be  modeled 
as  a Markov  chain,  it  should  suffice  to  observe  that,  if  we  model  the 
encoder  as  a finite-state  machine  driven  by  a stationary  sequence  of 
independent  source  symbols,  then  the  sequence  of  states  of  the  machine 
forms  a homogeneous  Markov  chain. 

The  resulting  closed- form  expression  for  the  spectrum  should  allow 
an  encoder  and  a modulator  to  be  designed  in  order  to  achieve  a given 
spectral  behavior  of  the  transmitted  signals.  No  results,  however,  are 
available  yet  in  this  area. 

The  assumption  that  (£n)“=oo  is  a homogeneous  Markov  chain  means 
that  the  one-step  transition  probabilities 

PrUn+i  = jKn“  i*  M (2.54) 

are  independent  of  the  value  of  n.  In  this  case,  we  can  define  the 
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transition  probability  matrix  of  the  process  as  the  MxM  matrix  P«(p_) 


with  entries 


Pij  ■ Pr{W  *> 


(2.55) 


(The  i-th  row  of  P is  the  probability  distribution  of  , given 

that  £ =i  .) 

» n 

The  further  assumption  that  the  chain  is  regular  means  that  the 

limit 


P°°  = lim  P11 
nx® 


(2.56) 


exists;  the  matrix  P“  is  idempotent,  i.e., 


(P°°)2=  P°° 


(2.57) 


(f'ij  ■ pj 


(2.58) 


where 


Pj  = PrtCn*  j}  j-1 M 


(2.59) 


describes  the  probability  distribution  of  the  random  variables  £n  . 

The  column  vector  of  these  probabilities  can  be  obtained  by  solving 
the  system  of  linear  equations 


T 

P p = p 


1 p = 1 


(2.60) 


where  (•)'  denotes  transpose,  and  1 is  a column  vector  all  of  whose 


entries  are  1 
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Recall  now  (2.24);  defining  the  column  vector 
Q(“)  - [Q(w;  1)  ,Q(u>;2)  , . . . ,Q(u>;M)  ] ' 
and  the  diagonal  matrix  n whose  non- zero  entries  are 
QPii  = Pi  i*l M 

we  can  write 

P(u);H)  = E{Q(o,;^£+k)Q*(ai;Ck) ) 

N N 

= Jj  ^ P(W  i)P(V  i)Q(‘-*o)Q*^;i) 

= Q+(u>)  n P*  Q(U) 

■J* 

( denotes  conjugate  transpose). 

Talking  the  limit  as  £-**>  : 

oCw;*)  = Q+(u>)  n P°°  QCw) 

Thus, 

*c<“)  = T I Ip(w;£)  - p(w;“)]e‘'j*wT 
a- 

observing  that  p(u>;-n)  = p*(ui;n)  , 

f ®e)  £ [p(u>;£)-  p((j;<») )e  1-  i J p (u> ; 0) - p(u>;<* 

l £=0  I 1 

using  (2. 63) -(2. 64), 

• f 5Re{9‘  n l (Pfc-Pdo)e‘;"^'T  g(u))|  - ^ Q'  (uj)n(i-p 


(2.61) 


(2.62) 


(2.63) 


(2.64) 


01  ■ 


*)Q(ui) 


(2.65) 
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Observe  now  that  P P = PP  = P ; thus 


n°° 


r a, 

I-p 

£=0 

ao  0 

(jp-p  r 

JtfO 

5 follows : 

» = | IRe  | Q+  (co) n/V (oj) Q (w)  } - ^ Q+(w)n(I+P°“)Q(w)  (2.67) 


where 


A(u>)  = l (P-P°°)£  e'^ 
£=0  " - 


In  Appendix  B,  some  techniques  are  shown  for  the  computation  of  th.  matrix 
series  (2.68). 

In  a similar  way,  we  can  derive  the  expression  for  the  discrete  part 
of  the  spectrum: 

, 0° 

»d(u))  = Q Qt(0))n  P°°Q(uO  l 6(oo-n^)  (2.69) 

n=-°° 


Eqs.  (2.67)  and  (2.69)  were  first  obtained,  in  scalar  form,  by  Tausworthe 
and  Welch  (1961).  Subsequently,  they  were  independently  rediscovered  by 
several  other  researchers. 

For  a further  specialization  of  (2.67)-(2.69)  to  constant -length 
alphabetic /codes,  see  Cariolaro  and  Tronca  (1974).  Gilchrist  and  Thomas 
(1975)  consider  the  computation  of  the  power  spectrum  for  algebraic  error - 
correcting  codes. 

No  result  seems  to  be  available  on  the  power  spectrum  shaping  ob- 
tained using  convolutional  codes;  the  analysis  could  be  carried  out  as  a 
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rather  straightforward  application  of  (2.67)  r (2.69). 

An  example 

Consider  the  Miller  code  (§1.2)  . The  coded  sequence  can  be  repre- 
sented as  a 4-state  Markov  chain  with  transition  probability  matrix 


P = 1- 


1 

0 

1 

0 


The  vector  Q(a>)  is  given  by 


0 

1 

0 

1 


Q(w)  = G(u>) 


l*e^T/2  1 

i-e-W2 

_l+e->T/2 


-1-e 


-jtoT/2 


where  G(w)  is  the  Fourier  transform  of 


We  get 


g(t) 


1 

4 


0<t<T/2 

elsewhere 


1 

1 

1 

1 
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A oo  1 

A = P - P = i 


-1 

+1 

-1 

+1 

-1 

-1 

+1 

+1 

+1 

+1 

-1 

-1 

+1 

-1 

+1 

-1 

By  application  of  the  Faddeev  algorithm  (Appendix  B) , we  tet 

= ^2  = y i 6 3 = = 0 


0 

1 

4 0 


1 0 
0 1 


b3  = o . 


In  conclusion,  we  get 


#(w)  =»  <8  (to)  3 - ( 3^n  ^ 3 * coo  toT/2  2ao8  toT  - jor.V)I72 

' toT/4  ^ 9 + 12 cos  toT  + 4c\  a 2;.  T 
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CHAPTER  3 - NONLINEAR  CHANNELS  WITH  MEMORY 


3.1  INTRODUCTION 

In  this  chapter,  I shall  consider  the  problem  of  characterizing 
nonlinear  channels  with  memory.  Actually,  analysis  of  these  channels,  as 
well  as  evaluation  of  the  performance  of  digital  transmission  schemes,  is 
an  important  practical  problem.  For  example,  in  telephone  lines  used  for 
data  transmission,  the  advent  of  equalization,  and  hence  of  new  precision 
in  transmission,  revealed  that  nonlinear  distortion  --  arising  principally 
from  inaccuracy  in  companding  --  is  a serious  source  of  performance  impair- 
ment. It  has  been  conjectured  that,  for  data  transmission  systems  operat- 
ing at  rates  greater  than  4800  bits/s  , the  error  rate  is  almost  entirely 
determined  by  nonlinear  distortion  (Lucky ,1975) . 

Another  important  example  of  nonlinear  channel  with  memory  arises 
from  digital  satellite  communications.  Th  on-board  amplifiers,  operated 
at  or  near  saturation  for  better  efficiency,  exhibit  strongly  nonlinear 
characteristics. 

The  Volterra-series  approach  has  been  taken  because  it  provides  the 
most  general  analytical  tool  to  deal  with  nonlinear  channels  with  memory. 

Although  it  suffers  some  drawbacks  --  for  example,  it  cannot  be  parametrized, 
not  unlike  the  impulse  response  for  characterizing  a linear  channel  --  its 
generality  makes  it  attractive  in  several  instances. 

3.2  CHARACTERIZING  A NONLINEAR  CHANNEL 
For  linear  channels,  the  input -output  relationship  is  fully  described 

| 

by  their  impulse  response.  Similarly,  if  the  channel  is  nonlinear  but  memory- 


» A 
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less,  and  sufficiently  well-behaved,  an  input -output  relationship  can  be 
obtained  by  expanding  the  nonlinear  characteristics  in  a power  series. 

More  generally,  for  a nonlinear  channel  with  memory  that  satisfies  certain 
regularity  conditions,  a generalization  of  these  output  representations 
is  provided  by  Volterra  series,  Fjitry  to  the  literature  on  this  subject 
can  be  made  through  Bedrosian  and  Rice  (1971).  Here  I shall  present,  mostly 
in  a heuristic  way,  some  of  the  basic  features  of  this  theory  and  certain 
of  its  applications. 

To  provide  some  motivation  for  the  general  input -output  relation- 
ship of  a nonlinear  system  with  memory,  let  us  consider  a simple  example. 

Assume  that  the  system  is  created  (Fig.  3.1)  by  cascading  a linear, 
time -invariant  system  with  impulse  response  h(t)  and  a nonlinear,  memorv- 
less  system  with  an  analytic  input-output  relationship  z(t)=f(v(t)]  . 


FIG.  3.1 


Let 


f(«) 


00  , vk 

£ Yk 


k=l 


I 

(3.1) 


1 


be  the  Taylor  series  expansion  of  f (* ) (assume  that  f (01 =0  , so  the  term 
with  k-0  is  missing). 


l 
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Denoting  by  z(t)  the  output  when  the  input  is  u(t)  , we  easily 
get,  under  suitable  regularity  assimptions  for  the  functions  involved: 


= f h(t)u(t-T)dT 


“ Yv  r r°° 

l cr  h(T)u(t-T)dT 

k=l  K‘  J 


■ ji  et  r dxi  r dT2  •••  r ^ • ]jx  h<v  • ^ 


-OO  -00 


(3.2) 


Letting 


\^T1,T2’ 


A Yk  k 

*»V"k!  h(V 


(3.3) 


the  input -output  relationship  takes  the  following  form: 


OO  -OO  -00  -OO 

z(t)  = l I dxx  j dt2  •••  j dTkhk(T1.T2’-*-»Tk)*  TT  u(t-Tr)  (3.4) 

k=l  -a,  -on  -OO  r=l 


Eq.(3.4),  without  assumption  (3.3),  is  the  most  general  form  of  input- 
output  relationship  of  a time -invariant  nonlinear  system  with  memory  that  meets 
certain  regularity  conditions.  It  can  be  seen  that  the  "Volterra  kernels" 
\^t1,t2’ ' •* ,Tk^  ’ a 8enerali2ati°n  impulse  response  for  linear  systems, 
completely  describe  the  system  behavior.  Thus,  the  problem  of  characteriz- 
ing a nonlinear  system  with  memory  reduces  to  the  problem  of  computing  its 
Volterra  kernels. 

We  have  seen  that,  for  the  sinple  channel  obtained  by  cascading  a 


linear  system  and  a nonlinear  channel  with  memory,  the  Volterra  kernels  are 
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given  by  Eq.(3.3).  In  a similar  way,  it  can  be  shown  that,  for  a system 
which  can  be  modeled  as  a memory less  nonlinearity  f(*)  preceded  and 
followed  by  two  linear  systems  (Fig.  3.2), 


FIG.  3.2 


where  are  the  power  series  coefficients  of  f(*)  , and  h'(*),h"(*) 
are  the  impulse  responses  of  the  linear  systems  preceding  and  following 
the  nonlinearity,  respectively. 

Eq.(3.5)  has  a simple  interpretation,  which  may  also  be  useful 
to  compute  numerically  the  Volterra  kernels.  Fix  Ti>**-»TS  i,Ts+i’“’,Tk  ’ 
then  rewrite  (3.5)  as  follows: 

yk  r f k 

h.fT-,...^.  ) = r-j-  dT  h"(T)  TT  h'(T  -t)  h’(T  -t) 

L r=l 

r/s 


f 
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This  expression,  apart  from  a constant  coefficient,  can  be  viewed 
as  the  response,  at  time  , of  a linear  system  with  impulse  response 
h' (•)  to  the  input 

k 

h"(t)  TT  h’(T--t)  . 

r=l  r 
rts 

Using  standard  numerical  filtering  techniques,  if  yk,  h'(*)  and 
h”(*)  are  known,  it  is  possible  to  compute  any  kernel  for  any  prescribed 
set  of  time  instants  . 

It  can  be  observed  that,  in  both  equations  (3.3)  and  (3.5),  the 
kernels  are  syimetrical  functions  of  their  arguments;  i.e.,  kernel  values 
do  not  change  if  their  arguments  are  permuted.  It  can  be  shown  (Bedrosian 
and  Rice, 1971)  that  the  assumption  of  symmetry  for  the  kernels  does  not 
entail  any  loss  of  generality  so,  throughout  this  chapter,  all  Volterra 
kernels  will  be  assumed  to  be  symmetric,  unless  otherwise  specified. 

3.3  SOME  EXAMPLES 

In  this  section,  the  expression  of  the  output  from  a channel  de- 
scribed by  a Volterra  series  will  be  specialized  to  some  cases  of  practical 
relevance.  In  particular,  a digital  signal,  a general  harmonizable  random 
process  and  a sum  of  two  such  processes  will  be  considered  as  inputs  to 
the  channel.  The  corresponding  outputs  will  be  derived  in  a form  that  will 
allow  some  analysis  of  their  statistics. 

Consider  first  a baseband,  linearly  modulated  digital  signal 

oo 

x (t)  = [ ck  q(t-kT) 
k“-°° 


(3.6) 
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where  (cn)  ^ is  a real,  discrete -time  stationary  random  process. 

We  can  assume  that  (3.6)  is  obtained  by  passing  the  following  signal: 


u(t)  = l ck  6(t-kT)  (3.7) 

k=-°° 

through  a linear,  time- invariant  system  whose  impulse  response  is  q(t). 
This  linear  system  will  be  included  in  the  channel  structure,  and  the 
Volterra  kernels  of  the  channel  will  be  modified  accordingly. 

If  the  signal  (3.7)  is  sent  through  a channel  described  by  the 
general  input -output  relationship  (3.4),  we  get  at  the  output 


v(t)  = 


l l Z ...  Z c 

k=l  n^  n2  nk  1 


■%  hkCt-iijT.t-njT,. 


(3.8) 


where  the  indices  n1,n2,...,nfc  run  from  -®  to  °o  . In  particular,  if 
we  sample  the  signal  v(t)  at  t=tQ  , we  get 


v(t0) 


l l l- 

k=l  n1  n2 


• l °nin2' ’ *C\  VW  • • • ’ V 

k 


(3.9) 


where 


WV-'V  ’ VW’W W1 


(3.10) 


Eq.(3.9)  can  also  be  rewritten  as  follows: 


r(t„)  = c0  Hj(0)  * I I y 


1 k 


(3.11) 
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where  one  can  easily  recognize  the  various  contributions  to  the  signal  at 
the  output  of  the  channel:  useful  signal,  intersymbol  interference  produced 
by  linear  distortion,  nonlinear  interference. 

Assume  now  that  the  input  of  the  nonlinear  channel  is  a real,  har- 
monizable  process  u(t).  Using  the  series  expansion  for  these  random  pro- 
cesses devised  by  Cambanis  and  Liu(1971)  (a  heuristic  proof  of  this  result 
is  presented  in  Appendix  C for  the  special  case  of  wide-sense  stationary 
processes),  we  can  represent  u(t)  as 


“(t)  = l Sn  sn(t) 

n 


(3.12) 


where  (f,  )°°  is  a real,  discrete- time  random  process  such  that 
n n=-® 


F.£  4 =6 
snsm  mn 


(3.13) 


At  the  output  of  the  channel,  we  get 


I l -l  ?„I5n2"-^0kttlnl V 


k=l  n,  n 


1 n2  nk 


(-3.14) 


where 


.oo  .oo  ^00  K 

ok(t;nltn2,...  ,nk)  = j dtj  J dx2...j  dTk  ^(tj  ,t2  , . • • »Tk)  TT 


-00  - 00  - 00 


(3.15) 


An  important  special  case  arises  when  sn(t)  satisfies  the  fol- 


lowing relation: 


s (t)*  s(t-nO)  . 


(3.16) 
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(Besides  the  special  case  of  digital  signals,  (3.16)  holds  true  for  example 
when  u(t)  is  a bandlimited  white  noise:  see  Appendix  C.)  In  this  case 

sn  (t-Tr)  - s(t-xr-nr0) 
r 

so  that,  for  each  value  of  k , only  one  integral  has  to  be  computed  to 
obtain  the  kernels  (3.15). 

Consider  finally  the  sum  of  two  random  processes,  say 

u(t)  = x (t)  + y(t)  . 

Assume  that  they  can  be  expanded  in  the  following  form 
x(t)  = ) £,  s (t) 

k j l >n  n'- 

n 

y(t)  - l Hh  rn(t)  (3.17) 

n 

Generally,  this  can  be  obtained  by  using  the  Cambanis-Liu  expansion, 
or  using  eq.(3.6),  if  x(t)  or  y(t)  or  both  are  digital  signals. 

The  k-th  order  term  in  the  Volterra-series  expansion  of  the 
output  of  the  nonlinearity  is  given  by 


(3.18) 


/ 


we  can  write  the  output  of  the  nonlinearity,  say  v(t),  as 


- I l l l 

k=l  i=0  n^ 


Un‘ 

\ 1 


•^n  h_  *■'%  Pv-i  i^t;ni Hjr  ^ 

nk-i  nk-i+l  *k  K 1,1  1 K 


(3.20) 

/ 

Notice  that  the  terms  p,  • ■(•••)  account  for  the  interaction  he- 

K“  1 l 1 

tween  the  two  processes.  In  particular,  Pj.  q(***)>  l^k<0°  , give  the 
output  corresponding  to  the  input  x(t)  alone,  whereas  Pq  ^ C * * * ) » Hk<» 
do  the  same  thing  for  the  input  y(t)  alone. 


I' 
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3.4  FIRST -PRIMER  STATISTICS  OF  THI:.  OUTPUT  OF  A NONLINF.AR  CHANNI3. 

In  this  section,  we  shall  consider  the  problem  of  evaluating  the 
first-order  statistics  of  the  signal  at  the  output  of  a nonlinear  channel. 

In  Appendix  D,  it  is  shown  how  it  is  possible,  from  the  knowledge 
of  a few  moments  of  the  random  variable  X , to  derive  bounds  to  quanti- 
ties like 


E{Q(X)} 

where  •)  is  a known  function,  or 

PriX  s X} 


where  X is  a known  quantity. 

Thus,  the  problem  of  deriving  the  first-order  statistics  of  a 
random  variable  can  be  reduced  to  the  problem  of  computing  its  moments. 
Consider,  in  particular,  the  problem  of  computing  the  moments  of  the  pro- 
cess at  the  output  of  a nonlinear  channel  with  memory,  bet  us  assign  a 
a time  instant  t()  , and  write  the  channel  output  at  t()  as 


l I l ...  I a a 


k”l  nj  n, 


nR  nl  n2 


,ank  Sk(nl,n2,,,,,nk) 


(3.21) 


It  can  be  observed  that  expression  (3.21)  encompasses  both  cases  (3.9) 

and  (3.14),  with  proper  definition  of  the  quantities  involved. 

0 

The  problem  of  computing  the  moments  F.(E  } will  be  solved  in  two 

2 3 

steps:  first,  l shall  show  that  H ,5  ,...  can  be  given  the  form  of 


; 


jfl 


,...  can  be  given  the  form  of 
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Volterra  series.  Second,  I shall  develop  a procedure  for  averaging  a Vol- 
terra  series  (see  Benedetto,  Biglieri  and  Daffara  (1976)). 

Consider  the  two  Vol terra  series 


l l 

k=l  n 


1 


l an,  •*'anvFk(:nl V 

"k  1 k 


(3.22) 


and 


l l -l  V 


k=l  n 


\ 


(3.23) 


The  product  between  them  can  be  given  the  form 


L l "'l  an/"an,  Qk(V"*'V 

1 


k-1  n,  n^  nl  \ 


(3.24) 


where  the  resulting  kernels  Q^(* • •)  are  obtained  through  the  recursion 


k-1 


Qk(nl V ~ Fi(-nl VGk-i(-ni+l,,,',V 


k^2 


(3.25) 


= 0 


-Si 


Using  this  result,  we  can  see  that  the  power  S can  be  given  the  form 


J 


(A) 


E = y y . ..  y a ...a  S (n,  ,...,n.) 
k=t  ^ nl  nk  k 1 V 


(3.26) 


00 


where  the  Volterra  coefficients  (•••)  are  obtained  recursively  as 


with  the  starting  values 


(1) 

SA  (nj n.)  = Si(n1,...,n.)  (3.28) 


In  conclusion,  any  power  of  E can  be  expressed  as  a Volterra 


series  whose  coefficients  can  be  derived  through  a recurrence  relationship. 


Thus , the  computation  of  the  moments  of  5 is  reduced  to  the  computation 
of  the  average  of  a Volterra  series. 


I shall  describe  in  the  following  a simple  procedure  for  computing 


such  an  average,  under  the  hypothesis  that  the  random  variables  a are 

ni 

statistically  independent  and  equally  distributed. 

0 

Due  to  the  structure  of  (3.26),  it  can  be  seen  that  EE  will  be 


obtained  provided  that  we  are  able  to  compute  tlie  average 


A similar  result  applies  to  the  computation  of  the  moments  in  a situation 
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like  that  represented  by  eq.(3.20)  (see  Benedetto,  Biglieri  and  Daffara 
(1976)  for  details) , when  the  sum  of  two  independent  processes  enters  the 
nonlinear  channel. 

As  an  application  of  these  procedures,  see  Benedetto,  Biglieri 
and  Daffara  (1976),  where  the  error  probability  for  PAM  transmission 
over  a nonlinear  channel  with  memory  is  computed. 


■ . . 


3.5  BANDPASS  NONLINEAR  SYSTEMS 


In  this  Section,  the  results  obtained  previously  --an  input-output 
relationship  valid  for  nonlinear  systems  with  memory  --  will  be  specialized 
to  the  case  of  bandpass  nonlinear  systems. 

Consider  such  a system  , and  a bandpass  input.  The  analytic  signal 
associated  with  the  input  can  be  expressed  as: 

x(t)  = A(t)  e^  la)#t  + (3.30) 

where  A(t)  and  9(t)  are  baseband  signals,  and  uj0  is  the  center  fre- 
quency of  the  power  spectrum  of  x(t)  . Letting 

A 

>Ht)  = wot  + 0(t)  (3.31) 

we  can  write  (3.30)  as 

x (t)  = A(t)  e^(t)  . (3.32) 

Consider  now  a nonlinear,  memoryless  system  with  input-output  rela- 
tionship given  by 

y(t)  « ^fx(t)] 

= ./[Ae^]  (3.33) 


where  y(t)  is  the  output  signal. 
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This  is  a periodic  function  of  <p  , with  period  2tt  . So  we  can 
represent  it  as  a Fourier  series: 

y(t)  " l cn(A)e-in*  (3.34) 

n=-°° 


where 


cnCA) 


1 


J0 


dip 


(3.35) 


Due  to  the  definition  of  ^ , (3.34)  shows  that  the  power  spectrum  of  y(t) 
.will  generally  include  several  spectral  zones,  centered  around  multiples 
of  the  frequency  coo  . Suppose  then  that  the  output  of  the  nonlinearity 
is  followed  by  a zonal  filter,  whose  function  is  to  stop  all  the  spectral 
components  other  than  that  centered  at  gj0  . 

The  analytic  signal  associated  with  this  output  component  is  given 
by 


where 


Yl(t)  = c1[A(t)]e3,p(t) 


(3.36) 


(3.37) 


Since  in  general  c^(A)  is  a complex  number,  we  can  write  it  in  the 
following  form: 


j<f>(A) 


cx(A)  = F(A)e 


(3.38) 
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so  that  the  output  (t)  becomes 


yx(t)  = F[A(t)]ej{l</(t)  + 


(3.39) 


Comparing  (3.39)  with  (3.32),  we  can  see  that  the  effect  of  a 
nonlinear  bandpass  system  will  be  to  alter  the  amplitude  as  well  as  the 
phase  of  the  input  according  to  a law  that  depends  only  on  A(t)  , the 
envelope  of  the  input. 

Thus,  to  describe  a nonlinear  bandpass  system  without  memory,  it 
is  sufficient  to  assign  two  functions  F(A),4>(A)  , describing  the  so-called 
AM/AM  conversion  and  AM/FM  conversion  of  the  system.  These  functions 
can  be  suitably  parametrized  in  order  to  get  a useful  characterization  of 
the  system  in  terms  of  a small  set  of  parameters  (see  Lindsey  et  al.,(1977), 
and  references  therein). 

Consider  now  the  more  general  case  in  which  the  nonlinear  system  has 
memory.  Rewrite  first  the  input  of  the  system  as 


x(t)  - x(t)ej<Uot 


(3.40) 


where  x(t)  , the  complex  envelope  of  the  input,  is  defined  as 


x(t)  = x(t)e"j“#t 


(3.41) 


The  output  of  the  system,  under  suitable  regularity  hypotheses,  can  be 
written  in  terms  of  a Vol terra  series  as 


I'M  ■ Jj  I dV--j  dTe  VT1 VT^Kelxd-'r)) 


(3.42) 


! 
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®e[x(t)]  = \ [x(t)  + x*(t) ] 


j [x(t)e 


>ot  + x*(t)e'jWot] 


(3.43) 


Observe  now  that,  with  Re[x(t)]  expressed  as  in  (3.43),  if  we 

expand  the  products  at  RHS  of  (3.42),  each  will  give  rise  to  a sum  of 

£ 

2 products.  Each  of  these  products  involves  k factors  of  the  type 
exp{±ju)0  (t-x  ) } . 

Suppose,  as  we  did  before,  that  only  the  first -zone  spectral  compo- 
nents of  the  signal  at  the  output  of  the  nonlinearity  are  of  interest. 

This  means  that  among  these  products  we  need  retain  only  those  which  give 
rise  to  a factor  exp± jw0t  . 

Thus,  we  are  constrained  to  consider  only  the  products  correspond- 
ing to  odd  values  of  £ , say  £=2k+l,  k=0,l Among  these,  we  retain 

the  products  with  k (respectively,  k+1)  factors  of  the  type  exp{-jwo (t-xr) } 
and  k+1  (respectively,  k)  factors  of  the  type  exp{+ju)0 (t-rr) } . 

There  are  ( ) of  these  products , and  each  will  give  the  same 
value  for  the  £-fold  integral  under  the  assumption  of  symmetric  kernels. 

Thus,  denoting  by  y^(t)  the  first- zone  filtered  component  of  the  output 


signal, 


y^t)  = j [e^“#t  y(t)  + e ^a,°t  y*(t)  ] . 


(3.43) 


where 


(V) 


nt)  « i 

k=0  2Zk 


.00  .oo 

j dtj.-.j  dT2k+lh2k+l(Tl,-#- *T2k+l) 


K . ZK+I 

• TT  x*(t-T  )e  Ja)°Tr  • TJ  x (t-t  )< 
r=l  r s=k+l  s 


-J^OTS 


(3.44) 
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Since  (3.43)  can  be  rewritten  as 


YjCt)  = JRe[y(t)eju>ot]  , (3.45) 


we  can  identify  y(t)  with  the  complex  envelope  of  y^(t)  . 

The  expression  for  y(t)  will  now  be  modified  in  order  to  define 
equivalent,  low-pass  Vol terra  kernels.  For  this  purpose,  we  must  exploit 
the  hypothesis  that  the  nonlinear  device  is  bandpass.  This  means  that  the 
Fourier  transforms  of  its  Volterra  kernels: 


A r°°  r» 

'*2k+l  ^1  ,U)2  * ’ ” ,u)2k+l^  = j dTl"'J  dT2k+lh2k-*-l(xl,‘"’T2k+l')  * 


oo  - oo 


earp-jfWjTj  + u)^  + . . •+a)?k+lT7k+l ) 


2k+r  2k+lJ 


(3.40) 


differ  significantly  from  zero  only  over  small  neighborhoods  of  the 
points  with  coordinates  (±io0  ,±w0 ,. . . ,±u>0)  • 

2k+ 1 

Thus,  we  can  write  '^k+l^’"^  as  a sun  °*'  2 functions  with 
arguments  (UjtWg.w^Wg, . . . »a)2k+l±U)0^  ' these  functions  is  signi- 
ficantly different  from  zero  only  in  the  neighborhood  of  the  origin. 

To  prove  this,  consider,  as  system  input,  a sinusoidal  signal  with 
frequency  <*>’  and  amplitude  A . The  complex  envelope  of  the  output  is, 
from  (3.44): 


A2k+1  ^ »u)’  »•  • • •a>’  »*“’  »•••>-“>’) 


(3.47) 
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For  the  system  to  be  bandpass,  y(t)  must  be  significantly  different  from 
zero  only  for  <*>'  = co0.  This  is  actually  the  case  only  if  the  Volterra 
kernels  have  a functional  structure  like  that  just  described. 

Take  now  the  inverse  Fourier  transform  of  J^k+l^" • •)  • In  general, 
denoting  by  the  l -dimensional  Fourier -transform  operator,  we  have 

X 

-1 

^ ^ y * * ’ 1 — - 3(jl)q  C-^j^-^2”  * * * } * 

>(^2  9 • • • )]  . (3.48) 

2k+l 

Thus,  the  inverse  Fourier  transform  of  ,JI2k+l^‘  * * ^ will  be  a sum  of  2 
baseband  functions  (kernels) , each  one  being  multiplied  by  a factor 


exp{  - ja>0  (±x  jit  2±  • • • ±x  7W1 ) ) 


2k+lJ 


(3.49) 


Observe  now  that,  when  such  an  inverse  Fourier  transform  is  substi- 
tuted for  h^k+i  (*  * * ) in  (3.44),  the  exponential  factors  in  the  integral 
may  combine  with  that  appearing  in  the  RHS  of  (3.48),  giving  rise  to  terms 
exp±jnu)QT^  , n^O  . These  terms  will  make  the  integrand  oscillate  at  fre- 
quencies not  less  than  wo  , so  the  corresponding  values  of  the  integrals 
will  be  relatively  small. 

Thus,  the  only  term  to  be  retained  in  the  inverse  Fourier  transform 
°f  j*<’2k+i  (•  • • ) is  the  one  whose  exponential  factor  cancels  out  with  those 
appearing  in  the  RHS  of  (3.44).  Thus,  for  our  purposes  we  can  approximate 

h2k+i(Ti T2k+1^  in  (3‘44)  as  follows: 


12k+l(Tl,,',,T2k+l)  = I e 


x _-jWo(V...+VVl t2k+1) 

n2k+l^Tl T2k+P 


(3.50) 
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and  call  h2k+l^"'^  * defined  in  this  waY»  the  "equivalent  low-pass  Vol- 
terra  kernels".  Thus,  eq.(3.44)  becomes 


y(t)  = l 


(r) 


k=0  2 


2k+l 


dx^. . . 


dT2k+l  h2k+l(Tl,"',T2k+l) 


k 2k+l 

TT  x*(t-r  ) JT  x(t-x  ) 

r=l  r s=k+l  s 


(3.51) 


Eq.  (3.51)  gives  the  complex  envelope  of  the  output  of  a nonlinear  system 
with  memory  in  terms  of  the  complex  envelope  of  the  input  and  the  equiva- 
lent low-pass  Volterra  kernels. 

Note  in  particular  that,  if  we  stop  the  expansion  at  RHS  of  (3.51) 
at  the  first  term  (linear  system),  we  get  the  known  result: 


rOO  ^ 

y(t)  = j j h(x)x(t-x)dx 

- oo 


(3.52) 


where  h(*)  is  the  equivalent  low-pass  impulse  response  of  the  linear 
system. 


Example  1 

As  a simple  example,  let  us  consider  a sinusoidal  signal  input  with  complex 
envelope 

x (t)  = A e-j0  (3.53) 

(this  corresponds  to  the  real  signal  x(t)  = A cos  (u>ot+0)  ).  Then  the  com- 
plex envelope  of  the  output  signal  is  given  by 


y(t) 


oo 


l 

k=0 


, 2k+l 


32k+l 


(3.54) 
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where 


2k+l  J “'1 


ch .... 


dx2k+l  h2k+l^Tl»* ' * »T2k+l^ 


(3.55) 


Using  now  (3.50),  we  can  get 


6 21c-*- 1 - ^ '^2k+l  ^ll)o  * " * " ,ll)o  * -(i)o  ’ ' ‘ " * _ll)o ) 


-k+1- 


(3.56) 


in  accordance  with  the  result  of  Bedrosian  and  Rice  (1971). 

Example  2 

Consider  now  the  transmission  of  a bandpass  digital  signal  over  this  chan- 
nel. I shall  assume  that  the  channel  has  been  modeled  in  such  a way  that 
the  complex  envelope  of  the  input  signal  takes  the  form: 


x(t)  - l c S(t-nT) 


(3.57) 


n=-° 


where  (c  5^.^  is  a discrete- time,  complex  random  process. 
Using  (3.51),  we  get 


y(i)  = I Hk  £ £ cn  • * ,cn  c n * ’ ‘c  n 

k=0  nt  n2k+]  1 Vl  k+2  n2k+l 


^2k+l^'nl^ t"n?i  + lT^ 


2k+r 


(3.58) 


*hrrc 


'v(r) 


-2k- 1 


(3.59) 


;r 
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3.6  MODELING  A SATFLL I'l'L'  CHANNEL  USING  BANDPASS  VOLTHkRA  SFRIHS 

As  an  example  of  actual  computation  of  a Volterra  series,  let  us 
consider  the  charnel  model  represented  in  Fig.  3.3,  which  is  usually 
assumed  for  satellite  comnuni cat  ions . Here  a nonlinear  memoryless  part, 


BANDPASS  NONLINEAR  SYSTEM 


FIG.  3.3 


representing  the  on-board  traveling-wave  tube,  is  preceded  and  followed 
by  two  bandpass  linear  systems.  The  first  represents  the  cascade  of  earth 
station  transmitting  filter  and  satellite  input  filter;  the  other  repre- 
sents the  cascade  of  satellite  output  filter  and  earth  station  receiving 
filter. 

Assume  that  h' (*),h"(*)  are  the  impulse  responses  of  the  two  fil- 
ters, both  of  whose  transfer  functions  H'0),H"(0  are  centered  around 
the  frequency  u>0  ; assume  also  that  the  memory  less  nonlinear  device  has 
an  analytic  input-output  relationship  of  the  form  (3.1). 

I 

Thus,  the  Volterra  series  representation  of  such  a system  has  ker- 
nels given  by  (3.5).  Consider  now  the  complex  envelope  of  the  output . 


Computing  the  Fourier  transform  of  the  kernels  (3.5),  we  get 


“^k+l5  ' T2EPTTT  H"(W”‘+U)2k+l)  JT  H' (“r) 

(3.60) 


2k+l 


Introduce  now  the  equivalent  low-pass  transfer  functions  of  the  two  filters, 
writing 


H'  (<d)  = H'(u)-u)0)  + H'*(-a)-u)0) 

H"(w)  = H,,((jj-ojo)  + H”*(-w-oj0)  (3.61) 


Inserting  (3.61)  into  (3.60),  we  get 


^2k+l 

'*2k+l(a,l’--<>a,2k+l)  = (2k+r» ! ^"(V'Vl'V  + 

2k+l  . 

+ H"*(-u>1 <*,2k+l”u0^  "TT  (H*  (aij.-Wg)  + H'^'u^-u^)}  (3.62) 

r=l 


2k+2 

If  the  products  are  computed , we  obtain  a sum  of  2 terms,  only  one  of 

which  will  give  rise  to  a factor  exp{-ja>0  (r^+-  • T2k+P  after 

the  inverse  Fourier  transform  is  taken.  This  term  is 


r2k+l 


2k+l 


(iknyr  [HMcv*+u2k+ra,o)1  tt^c-v'V  sTT1  h’K-<v 


(3.63) 


as  can  be  seen  by  computing  its  inverse  Fourier  transform,  which  is 


2k+l 
(2k+l) ! 


where  h'(»),h"(«)  are  the  equivalent  low-pass  impulse  responses  of  the 
linear  filters. 


So,  using  (3.50),  we  finally  obtain 


It  can  be  observed  that 


(i)  if  H'(.)  and  H"(.)  , the  transfer  functions  of  the  two  filters,  are 
symmetric  around  the  center  frequency  o)0  » then  the  impulse  responses 
h'(*)  and  h"(*)  are  real  functions,  and  so  is  the  integral  (3.64), 


(ii)  if  the  Volterra  kernels  are  not  real,  they  are  not  symmetric  functions 
of  their  arguments;  only  if  their  first  k , or  last  k+1  , arguments 
are  permuted  is  the  value  of  the  kernels  unchanged. 


Let  us  now  turn  our  attention  to  the  memoryless  nonlinear  device  of 
Fig.  3.3  . In  order  to  describe  this  device  through  an  input-output  rela 
tionship  involving  complex  envelopes,  let  us  observe  that  the  bandpass  Vol 
terra  series  expansion  for  it  can  be  obtained  simply  by  letting 
h'(*)  = h"(*)  = 6(*)  in  (3.64)  and  using  (3.51): 
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(Notice  that  only  the  odd-order  terms  of  the  series  expansion  (3.1)  are 
involved  in  (3.65).  The  even-order  terms  give  rise  to  spectral  zones  of 
the  output  signal  in  which  we  are  not  interested.) 


Letting 


x(t)  = A(t)e 


j9(t) 


we  can  rewrite  (3.65)  as 


(3.66) 


y(t)  = e1*0^ 


v r2k+l  .2k+l,^ 

l 2Tc+T  A (t) 

k=0  k!(k+l)!2  1 


(3.67) 


Since  in  general  the  output  of  a bandpass  nonlinear  device  can  be 


represented  as 


y(t)  = F(A)e 


j [e(t)+<t>(A)] 


(3.68) 


(see  (3.39)),  comparing  (3.68)  and  (3.67)  we  get 


XA)  = J Y2k+1  A2k+1 

k=0  k!(k+l)!22k+1 


For  example,  if  the  LHS  is  represented  using  a power  series: 


F(A)e 


we  easily  get 


j 4>  (A) 


k .k 


Y2k+1 


jUc+Tj  2k+l 


(3.69) 


(3.70) 


(3.71) 
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Similar ly,  the  LHS  of  (3.70)  can  be  represented  as 

F(A)ej»(A)  - f bt  J^aA)  (3.72) 

where  a is  a real  constant,  are  complex  nunbers,  and  the  6s's 

can  take  the  value  6£=5t  (see,  e.g.,  Shimbo,  1976)  or  are  zeros  of 
JlOO  (Lindsey  et  al,  1977).  In  this  case,  recalling 

00  ( nm  2m+l 

j (x)  = l -1-1)  * 

1 m=0  m!(m+l)!22m  1 


we  get  the  simple  result 


v _ 2k+l  r , • 2k+l 
Y2k+1  ~ a 1 M 


1=1 


n 


(3.73) 


. 


— 
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4.  ANALYSIS  OF  DIGITAL  COMMUNICATION  SYSTEMS 
OPERATING  ON  NONLINEAR  CHANNELS 

4.1  INTRODUCTION 

In  this  chapter,  some  of  the  results  obtained  before  will  be  used  to 
evaluate  the  performance  of  a digital  communication  system  operating  on  a 
nonlinear  channel  with  memory. 

First,  a Markov  chain  model  will  be  derived  for  the  output  of  a non- 
linear channel  whose  input  is  a digital  signal.  Then,  this  model  will  be 
used  to  evaluate  the  power  spectrum  of  such  an  output,  and  to  derive  the 
structure  of  the  optimum  (maximum- likelihood)  receiver. 

Finally,  a Markov  chain  model  will  be  derived  for  the  discrete  chan- 
nel created  by  this  communication  situation. 

4.2  A MARKOV  CHAIN  MODEL  FOR  THE  CHANNEL  OUTPUT 

Assume  first,  for  simplicity's  sake,  that  the  nonlinear  channel  can 
be  modeled  as  in  Chapter  3,  and  consider  the  channel  output  when  a linearly 
modulated  digital  signal  is  sent  at  its  input.  This  output  is  given  by 
(3.8)  or  (3.58),  according  to  whether  the  channel  is  baseband  or  bandpass. 

Let  us  assume  that  the  memory  of  the  system  is  finite.  In  the  Vol- 
terra  series  model,  this  is  equivalent  to  asstiming  that  all  the  Volterra 
kernels  that  describe  the  channel  are  zero  --or  reasonably  close  to 
zero  — when  at  least  one  of  their  arguments  , say  , 

takes  values  outside  the  interval  (0^,©2).  (Notice  that  01#02  are  not 
dependent  on  index  i , due  to  the  symnetry  of  the  kernels.) 

Under  this  assumption,  at  any  given  instant  t the  output  of  the 
channel  will  depend  only  on  a finite  number,  say  L , of  symbols  cn 


. In 
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fact,  all  summations  with  indices  ni  in  (3.8)  and  (3.58)  will  involve  a 
finite  number  of  terms.  Thus,  we  can  say  that  the  channel  output  v(t) 
is  some  function  of  the  type 


v(t)  * V(t,c^  (t)  * " * * (t)  ^ (4*1) 

where  ^j(t) , . . . ,*^(t)  are  integer  numbers  dependent  on  the  value  of  t . 

Furthermore,  we  can  say  that,  if  we  observe  the  channel  output  v(t) 
for  T seconds,  this  waveform  will  take  on  only  a finite  nunbor  of  pos- 
sible shapes.  In  fact,  as  t ranges  into  any  finite  interval,  the  inte- 
gers fcj(t),.  ..,£^(t)  will  take  different  values,  but  still  in  some  finite 
range. 

In  general,  we  can  say  that,  if  the  symbol  sequence  (c  )“_  is  a 
stationary  random  process,  observing  v(t)  for  T seconds  will  give  rise 
to  a finite  number,  say  M'  , of  different  waveforms.  If  L denotes  the 
number  of  values  taken  by  the  random  variables  c^  , and  V is  the 
number  of  symbols  on  which  v(t)  depends  as  t runs  in  an  interval  of 
length  T , we  will  get 

<S 

M'  $ L (4.2) 

(the  possible  inequality  accounts  for  the  situation  in  which  the  sequence 
(cn)  is  coded). 

In  conclusion,  at  the  output  of  the  channel,  and  assuning  for  the 
moment  that  there  is  no  noise,  we  shall  get  a situation  similar  to  that 
occurring  at  the  output  of  the  modulator  in  the  channel  model  analyzed  in 
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Chapter  2.  Every  T seconds  we  shall  get  a waveform  chosen  in  some 

M' 

finite  set  # (q(t;i) } , t e (0,T)  , so  that  we  can  write 

00 

v(t)  = l q(t-nT;£  ) (4.3) 

n=-c»  11 

co 

where  (^n)n=oo  is  a sequence  of  random  variables  taking  values  in  the 
set  {1,2 M'}  . 

We  can  analyze  this  communication  situation  provided  that  we  are  able 
to  determine  the  statistics  of  the  discrete -time  random  process  (€^^=-00  • 
This  is  accomplished  by  assuming,  for  notationaly  simplicity,  that  as  t 
runs  in  the  interval  [kT,(k+l)T]  , v(t)  depends  on  c^, . . . ,c^+  1 . 

Therefore,  £k  will  be  a function  of  the  same  random  variables. 

We  can  also  assume,  without  loss  of  generality,  that  £k  is  a one-to-one 
function  of  these  random  variables. 

Consider  now  the  time  interval  [ (k+l)T, (k+2)T]  . Here  v(t),  and 
hence  » will  depend  on  » • • • *c]c+,g»  » anc*  so  on  for  other  time 

intervals.  The  following  conclusion  can  be  drawn  immediately.  Consider 

^^k+l  ^k’^k-1’ ' ’ ‘ ^ : ^k+1  is  a one-to-one  function  of  c^ cic+^ » 

whereas  » • • • > (i-e- » the  past  of  is  a one-to-one  function 

of  ck+  i/’-l  ,ck+  i£-2 

OO 

Since  (ck)k=oo  is  a Markov  chain,  the  values  taken  by  c,^,  • • • »c]c+y 
will  depend  only  on  c^, . . . • This  means  that  the  values  taken  by 

Ck+i  depend  only  on  the  value  taken  by  ^ , and  not  on  those  taken  by 
^k-l’^k-2'*  ’ ‘ ’ In  ot*ier  words,  the  sequence  forms  a Markov 

chain. 

To  compute  the  transition  probability  matrix  of  this  chain,  assume 
that  £k  takes  value  i when  ' Vii'-'V/.f  iy.j  • 


Eq.(4.4)  also  shows  that,  provided  that  the  Markov  chain  (cn)n=-oo  *s  h0010' 
geneous,  so  is  the  Markov  chain  (O^a-a,  * 

Example 

Assume  ^=3  , and  a binary  signal  entering  the  channel.  Under  the  further 
assumption  that  this  signal  has  not  been  encoded,  the  random  variables  cn 
turn  out  to  be  independent  and  equally  likely,  so  that  the  process  (cn)”_  « 
is  described  by  the  trivial  transition  probability  matrix: 


The  transition  probability  matrix  of  (Cn)^_.a)  is  8x8.  Labeling  its 
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L 


rows  and  columns  by  the  corresponding  values  of  the  triplets  cic2c3  » 


we  get 


11000000 


00110000 


00001100 


P = — 
~ 2 


00000011 


11000000 


00110000 


00001100 


00000011 


000  001  010  011  100  101  110  111 


4.3  COMPUTATION  OF  THE  POWER  SPECTRUM 


Using  the  Markov  chain  model  derived  in  §4.2,  several  results 


about  the  digital  communication  system  at  hand  can  be  obtained  in  a con- 


ceptually straightforward  way. 


One  of  these  results  is  the  computation  of  the  power  spectrum  of 


the  channel  output.  Another,  the  derivation  of  the  optimum  receiver,  is 


discussed  in  the  next  section. 


Recalling  the  results  described  in  §2.6,  we  can  see  that  the 


spectrum  of  the  digital  signal  v(t)  , the  output  of  a nonlinear  channel 


with  memory,  can  be  computed  using  eqs. (2.67)-(2.69)  (Ajmone,  Biglieri  and 


Elia, 1978).  The  only  hindrance  to  the  application  of  this  method  is  created 


by  the  large  dimensionality  of  the  transition  probability  matrix  of  the 


Markov  chain  (£n) ; this  can  seriously  impair  the  usefulness  of  this  tech- 
nique when  the  memory  of  the  channel  is  considerably  long. 


In  any  case,  several  shortcuts  can  simplify  this  procedure  in  order 
to  make  it  practical.  The  simplest,  and  most  obvious,  is  to  take  advan- 
tage of  any  linear  system  that  can  possibly  be  present  at  the  end  of  the 
channel . 

Suppose  that  the  nonlinear  channel  can  be  split  into  the  cascade 
of  a nonlinear  subchannel  with  memory  and  a linear  system  with  transfer 
function  H(w).  This  is  the  case  of  any  practical  conmuni cation  system, 
where  the  demodulator  is  preceded  by  the  receiver  filter.  Under  these 
conditions,  we  can  compute  the  power  spectrum  before  this  linear  system, 
and  then  multiply  it  by  |H(oj)|2  . 

Another  shortcut  tries  to  take  advantage  of  the  structure  of  the 
matrix  P . Consider  for  example  the  problem  of  computing  A (to)  as  de- 
fined in  eq.(2.68).  This  can  be  done  (see  Appendix  B)  by  using  either 
Faddeev's  algorithm  or  the  closed-form  solution  (B.b)-(B.IO)  when  the  poly- 
nomial of  the  matrix  P is  known.  The  latter  technique  is  more  useful  in 

this  case.  In  fact,  it  can  be  proved  (see  .Ajmone,  Biglieri  and  Hlia,1978) 

k- 1 

that  the  minimum  polynomial  of  P is  simply  given  by  X T(X)  , r(\) 

being  the  minimum  polynomial  of  the  transition  probability  matrix  of  the 
chain  (cn) . This  fact  allows  a closed-form  solution  to  be  obtained  for 
/V(u>)  involving  the  parameters  which  depend  only  on  the  statistics  of  (c^) . 

Similarly,  the  solution  of  system  (2.6)  is  generally  required  to 
get  II  and  P°°  (see  (2.58)  and  (2.62)).  This  can  become  difficult  as  the 
dimension  of  the  matrix  P increases.  Actually,  this  computation  can  be 
reduced  to  a simpler  one  by  taking  advantage  of  the  relation  (4.4). 
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4.4  OPTIMUM  (MAXIMUM  LIKELIHOOD)  RECEIVER 

We  shall  now  derive  the  structure  of  the  maximun  likelihood  receiver 
for  a situation  in  which  the  channel  consists  of  a known  finite  memory  part 
followed  by  a noisy  memoryless  part.  In  particular,  we  can  assune  that  the 
received  signal  is  the  sum  of  a signal  like  (4.3)  plus  white  Gaussian 
noise.  This  situation  has  been  considered  in  a general,  abstract  setting  by 
Qnura  (1971) , and  for  the  specific  problem  of  a bandpass  nonlinear  channel 
by  Mesiya  et  al^.  (1977) . 

Since  the  channel  is  assumed  to  have  a finite  memory,  the  signal  re- 
ceived at  time  t can  be  written  as 

n2  (t) 

z(t)  = £ q(t-nT ; f ) + v(t)  (4.5) 

n=nx(t)  ^ 

where  n^t)  and  n^(t),  n^(t)  >n^(t)  , are  integers  dependent  upon 
the  actual  value  of  t . 

Assume  now  that  z(t)  is  observed  over  the  time  period  Os  t < KT  , 
say.  Denoting  by  and  ^ the  following  integers: 

N.  = min  n1 (t) 

0<t<KT 

(4.6) 

N~  = max  n~(t) 

0<tsKT  z 

we  see  that  the  observation  will  depend  on  the  values  taken  by  the  random 

variables  £ e,  . This  sequence  of  random  variables  may  take  one  of 

N1  n2 


(M') 


(4.7) 
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possible  states,  each  state  corresponding  to  a received  waveform  like 


vm(t)  = l qft-nTjn^)  0 s t < KT  (4.8) 

n=N^ 

where  m = (m^  , . . . ,mN  ) is  an  integer  sequence  denoting  a possible  state 

taken  by  the  sequence  of  random  variables  5™  . 

1 2 

Compute  now  the  log -likelihood  ratio  for  m ; we  get 


= rr-  [ v (t)z(t)dt  - ±- 
No  Jo  V NC 


A = 
m 


KT 


v 2 (t)dt 
m 


(4.9) 


Using  (4.5),  it  follows  that 

No 


2 2 

% " % Jn.  Jo  c'(t-nT;mn)z(t)dt 


N2  ,N2  rKT 


- r I l 

0 £=N1  n=N1 


q(t-£T  ;m^)q(t-nT  ;mn)dt 


(4.10) 


Notice  now  that,  under  our  hypotheses,  q(.;.)  has  a finite  duration  T. 
Assuming  that  K is  large  enough  so  that  we  can  disregard  end  effects, 
we  have 


fKT 

q(t-nT;m  )z(t)dt 

J0  n 


r(n+l)T 

q(t-nT;m  )z(t)dt 
nT  n 


(4.11) 


Similarly  we  can  observe  that,  owing  to  the  finite  duration  of  q(.;.)  , 

0 J#n 


KT 


0 


q(t-flT;m£)q(t-nT;mn)dt  =, 


(4.12) 


q2(t;mn)dt  £=n 


— - ■ - 


„ 


Thus,  defining 


A f(n+DT 


A Pirr 

an(mn)  ■ 
n n JnT 


q(t;mn)z(t)dt 


(4.13) 


and 


A P 

<^(m  ) - q2(t;m  )dt 
n j0 


(4.14) 


we  finally  get 


N2  N2 
Am  - ~ l a (m  ) - rp  l <£(m  ) 
® N0  n=Nx  n n N0  n=Nx  n 


N, 


■ k Jn/20”1""’  ' 


(4.15) 


We  can  observe  that: 

(i)  a (m  ) can  be  obtained  as  the  output,  sampled  at  time  (n+l)T  , of 
' n n 

a filter  matched  to  q(t;mn)  to  the  input  z(t),  nT  < t < (n+l)T  . 

(ii)  <?(mn)  is  the  energy  of  the  waveform  qftjm^)  . 

The  maximum  likelihood  sequence  decoding  rule  now  requires  ^ to 
be  maximized  over  the  set  of  possible  sequences  m . Observe  now  the 
structure  of  A given  by  eq.(4.16)  which,  with  obvious  notations,  we  can 


rewrite  as 
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It  is  seen  that  A)n  is  a function  of  the  vector  in  through  a sun 
of  the  functions  of  its  coordinates. 

This  observation  forms  the  basis  for  the  application  of  the  Viterbi 
algorithm  to  the  solution  of  this  demodulation  problem.  In  fact,  the  hypo 
thesis  that  the  process  (Cn)  forms  a Markov  chain  means  in  particular 
that  the  set  of  values  that  each  £n  is  allowed  to  take  depends  on  the 
future  values  £n+i»£n+2' * * ' on^y  through  £ . Since  we  have  denoted 

these  values  by  nv  , this  is  equivalent  to  saying  that  the  allowable  values 
for  m^  will  depend  on  the  value  of  the  other  components  m j,m]i+,,.,, 
of  m only  through  mn+j  . 

This  assumption  is  crucial  in  order  to  allow  the  Viterbi  algorithm 
to  be  applied  to  the  solution  of  this  maximization  problem. 

To  see  why  this  is  true,  consider  for  notational  simplicity  the  case 
Nj=l,  N^=N  . Then,  the  demodulation  problem  is  equivalent  to  finding 

N 

y = max  l Xn(m t ) (4.17) 

mj » • • • »n»jsj  n=l 

i.e.,  maximizing  a function  of  N arguments  made  up  of  the  sun  of  N 
functions,  each  of  them  dependent  on  only  one  of  the  arguments. 

Denote  by 


mj— *-(mi+1,mi+2,. . .)  (4.18) 

the  values  that  m.  is  allowed  to  take  under  the  constraint  that  the  fol 

lowing  components  of  m take  values  m.+j,m.  + , The  maximization 

problem  can  be  solved  sequentially  as  follows: 


y = max  max  ••• 

"n  Vi”*  V 


max 
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N 

max 

max  l \i(V  * 

(®j » • • • 

1 

max  < 

\ max  lx,  (nol  + X2^m2^ 

(m3 » • • • 

(m2,...,mN) 

\ \ 

•••J  + XN(,nN^| 

(4.19) 


This  can  be  written  recursively  as  follows: 


V*2  (m2 » • * * 


max 

0^2 » • • • 


..  .ny,)  ■ max  { V»2  ("^ » * • * ’"Yp  + X2^m2')} 

0 J m2-^(m3 n^)  v 


(4.20) 


y4(m. m*j)  - 'n®c  {*J3(m3’,,,,mNp  + x3^} 

q 4 m^  (m4 , — ,11^)  <■ 


UN(™n)  = max  ^-l^-l3 


y * max  yN(mN) 


In  words,  fix  first  m2 , . . . .n^  and  maximize  Am  with  respect 
to  the  values  of  m^  that  can  lead  to  those  values  of  m2 , . . . ,1^  • ®ue 


m 


i 
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to  the  structure  of  A„  , this  is  equivalent  to  maximizing  Aj(mj)  alone. 
This  must  be  done  for  all  possible  (N-l) -tuples  m^,...,*^  > and  results 

in  a function  of  m2,...,m^  that  we  call  u2  • 

Fix  now  m3 and  take  the  maximum  of  , with  mj  equal 

to  the  value  previously  obtained,  with  respect  to  the  values  of  m2  that 

can  lead  to  those  m3 . This  is  equivalent  to  maximizing 

lj 2 (m^ , • . • »m^)  + A2(m2)  , since  this  is  the  only  part  of  the  function  that 

does  depend  on  m2  . This  results  in  a function  of  m3 that  we 

call  u3  , and  so  on. 

Simplifications  of  this  basic  algorithm  are  possible,  depending  on 
the  structure  of  the  range  of  the  vector  m . The  simplest  possible  case 
arises  when,  for  all  i , we  have 


{m.  : m.->(m.+1,m  i+2»***)}  ( 

i.e.,  the  values  that  can  take  do  not  depend  on  the  values  of 
mi+l,mi+2’ ’ * ' ' In  0Ur  tiecodin8  Problem>  this  situation  occurs  when  the 
random  variables  (Cn)  are  independent.  In  this  case,  we  have 

y = max  X.jOil.)  + max  X ("V.  ,)+  •••  + m ^ A.tmj)  (4.22) 

"Vi  Vi  mi 

which  corresponds  to  bit-by-bit  decoding. 

The  second  simplest  case  arises  when,  for  all  i , we  have 


(m.  : m.>(mi+1,m.+2,...)}  = {m.  : m.-m^} 


(4.23) 


In  words,  the  values  that  mj  can  take  depend  only  on  the  next  coordinate. 


B5M 


PW 
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This  situation  occurs  when  the  random  variables  (Cn)  are  a Markov  chain, 


where  (4.21)  becomes 


u2(ii»2)  " max  ^1  (mi ) 


ra!  m2 


Mm,)  « max  (Mm,)  + A,(m,)} 

m -»m  * t it 


(4.24) 


m2  m3 


U4(m4)  * max  (u3(m3)  + A3(m3)} 


"Vm4 


MN(nV|)  1 "WI  ,uN-l(Vl>  * Vl'Vl11 

N-l  N 

U - max  Un("Yj) 

"Vi 

Notice  that  sometimes  further  simplifications  can  occur.  For  example, 
if  u^(m^)  does  not  depend  on  nr  , i.e.,  if  all  the  values  of  nr  give 
rise  to  the  same  value  for  n^(m^)  , the  following  iterations  can  be  sim>li- 
fied  taking  advantage  of  the  fact  that  is  now  a constant. 

Recursions  (4.24)  are  known  as  the  Viterbi  Algorithm  (see,  e.g., 
Viterbi  and  Omura,  1977).  The  performance  of  such  an  optimun  receiver  can 
also  be  evaluated.  Upper  bounds  to  the  probability  of  an  error  event,  or 
to  the  bit  error  probability,  can  be  computed  (see  Mesiya  et  al_. , 1977,  or 
Viterbi  and  Ctoura,  1977),  depending  on  the  set  of  distances 

A fOT 

d2(m,m')  - J (y^t)  - y?,(t)J*dt  . (4.25) 

| pi  i ■■■  ■ . 

^ • - - 


-4.14- 


4.5  A MARKOV  CHAIN  MOUKL  I-OR  Tig-  NOISY  CHANNEL 

Assume  that  the  channel  output  is  fed  into  a demodulator , whose 

output  is  a sequence  of  symbols  . To  account  for  possible 

soft-decision  demodulation,  we  can  assume  that  r,  takes  values  in  the 

k 

set  (1,2 , . . . ,Nf'}  , where  M"  z M , and  M is  the  lumber  of  values  taken 
by  the  random  variables  an  , the  information  source  outputs. 

Thus,  we  can  think  of  the  system  including  encoder,  modulator, 
continuous  channel  and  demodulator  as  a discrete  channel,  with  inputs 
{1,...,R}  and  outputs  {1,...,M")  (see  Chapter  1 for  notations) . This 
channel  is  generally  not  memory  less , due  to  the  presence  of  the  coder  and 
to  the  effects  of  the  continuous  channel. 

To  characterize  this  discrete  channel,  we  shall  build  a Markov  chain 
model  of  it.  This  model  is  derived  from  that  obtained  in  §4.2,  with  the 
only  addition  of  noise.  Recall  that  the  model  of  §4.2  describes 
the  channel  output  by  the  sequence  of  states  • 

If  the  values  taken  by  were  perfectly  known,  then  the  demodu- 
lation process  would  entail  no  error.  The  presence  of  noise  at  the  chan- 
nel output  implies  that  errors  can  be  made.  Clearly,  the  probability  of 
making  an  error  in  the  demodulation  process  will  depend  on  the  actual 
value  of  ^ , as  well  as  on  the  noise. 

In  particular,  assuming  for  simplicity  that  the  demodulator  is 
memoryless,  its  operation  will  be  described  through  the  function 

r * 6(f.  ,v  ) -a>vn<®’  (4.2e) 

n n n 

where  vn  represents  the  effect  of  the  noise  on  the  demodulation  when 
the  channel  state  is  . 
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Define  now  the  error  sequence  (e^)  as  follows: 

6(W=  6^n’0)  ® en  (4-27) 

where  © denotes  addition  mod  M"  . 

As  far  as  the  effect  of  the  noise  statistics  on  this  model  is  con- 
cerned, I shall  assume  that  the  random  variable  en  is  independent  of 
Cj,  j^n  , and  of  ^ , j7n  . 

Define  now  a two-dimensional,  discrete-time  random  process  as  the 
sequence  (^,e^)^=  oo  . This  process,  under  our  hypotheses,  is  a Markov 
chain  with  M'  * M"  states.  In  fact,  define 

»k-2 , . . . ) = Pr { »c^)  I *^k-l^  * ^k~2  *^k-2^  * * * * ^ = 

Pr{  | ^k-1  »^k-2»  * • * ,ek-l,ek-2*  ’ • * ^ (4.28) 

Under  our  assumptions,  the  value  taken  by  will  not  depend  on  the  values 
taken  by  • • • » thus 

.y(k,k-l,k-2,. . .)  = Pr{£j^,  1 2 * * • * ^ = 

= Pr{£^ | j *^k-2 * * * * ^ Pr^ek^k’^k-l,^k-2*’  * ’ ^ - 29) 

00 

Since  (£)<)}(= -„  a chain,  we  get 

.y(k,k-l,k-2,...)  = PH^IC^}  PrtekICk»5k-1»,,,}  (4,30) 

and  finally,  since  depends  only  on  the  actual  state  » 


_ — I Mi  .. 


-4 . lt>- 


jr(k,k-l,k-2,...)  - PrUkUk  l)  PrUk|5k>  (4.31) 

In  (4.31),  indices  k-2,k-3,...  do  not  appear,  so  the  pair 
is  independent  ol  ^k-2’*k-2^ ' ^k-3,ek-3^  * ’ * * * ^*s  *s  e'luivalent  to 

assunin^  that  our  process  forms  a Markov  chain.  The  transition  probabili- 
ties ot  this  Markov  chain  are  obtained  by  the  previous  computations: 

MCVS)  " ti'e)l^k-l,ck-l)  “ U»e’>}  m 

- Pr{Ck-i  ) Pr{ek-e|£k'U  (4.32) 

Let  P denote  the  nutrix  whose  entries  are 

(H.j  ^ l>r{£k-.i|£k  J-i)  i,.i«l M'  (4.33) 

and  let 

q.e  ^ Pr{ek-e|£k-i}  i-l,...,M'  (4.34) 

e»l . ,M" 

I 

probability  that  tin'  error  is  e when  the  channel  output  state  is  i ). 

Then  the  Markov  chain  so  constructed  nas  transition  matrix 


I 
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where 


\ .q?  .•••»q«  ] 

o i V 


(4.36) 


Example 

Consider  for  example  a binary,  non-coded  transmission  scheme  with  hard 
decision  modulator  (M"=2)  and  a channel  with  M'=4  . Let  the  matrix  £ 
describing  the  channel  behavior  be 


Let 


;md 
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1 

0 
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From  this  characterization  of  the  noisy  channel,  it  is  possible  to 
get  a ntmber  of  useful  parameters,  such  as  the  burst  length  distribution, 
describing  the  behavior  of  the  channel.  These  parameters  are  particularly 
useful  when  a code  has  to  bo  designed  for  use  on  this  channel,  since  the 
memory less  assumption  cannot  be  accepted. 

For  a discussion  of  these  parameters,  and  hew  to  obtain  them,  see 
for  example  Fritchman  (1967),  Tsai  (1969)  or  for  the  best  reference  on 
this  topic,  the  book  by  Blok)' , Popov  and  Turin  (1971).  The  model  consid- 
ered in  this  section  has  been  used  by  Tatebayashi  et  aj_.  (1975)  to  compute 
the  probability  of  A errors  in  a sequence  of  u symbols. 

% I 


-4.19- 

RSFERENCES 

M.  Ajmone,  E.  Biglieri  and  M.  Elia  (1978).  "Power  spectra  of  digital 
signals  after  nonlinearities  with  memory",  to  be  published. 

E.L.  Blokh,  O.V.  Popov,  V.  Ya.  Turin  (1971).  Models  of  Error  Generation 
in  Chan. .els  for  the  Transmission  of  Digital  Information,  Izdatel’stvo 
Svyaz' , Moscow  (in  Russian). 

B.D.  Fritchman  (1967).  "A  binary  channel  characterization  using  partitioned 
Markov  chains",  IEEE  Trans,  on  Inform.  Theory,  vol. IT- 13,  n.2,  p.  24  ff, 
April . 

M.F.  Mesiya,  P.J.  McLane  and  L.L.  Campbell  (1977).  "Maximum  likelihood 
sequence  estimation  of  binary  sequences  transmitted  over  bandlimited 
nonlinear  channels",  IEEE  Trans,  on  Coimiiun. , to  be  published. 

J.K.  Omura  (1971).  "Optimal  receiver  design  for  convolutional  codes  and 
channels  with  memory  via  control  theoretical  concepts",  Inform. Sciences, 

vol . 3 , p.243  ff. 

M.  Tatebayashi,  M.  Kasahara  and  T.  Namekawa  (1975).  "Characteristics  of 
decoding  error  in  discrete -memory  channel",  Electronics  and  Communica- 
tions in  Japan,  Vol.  58-A,  n.4,  p.  16  ff. 

S.  Tsai  (1969).  "Markov  characterization  of  the  HF  channel",  IEEE  Trans, 
on  Coirmun, Techno  1 . , vol.  COM- 17 , n.l,  p.  24  ff,  February. 

A.J.  Viterbi  and  J.K.  Omura  (1977).  Digital  Communication  and  Coding, 
to  be  published. 


APPENDIX  A: 


SPECTRAL  ANALYSIS  OF  NON -STATIONARY  RANDOM  PROCESSES 


The  problem  of  computing  the  spectrum  of  an  energetic  quantity  n 
associated  with  a random  process  x(t)  can  he  stated  as  follows:  we 
want  to  find  a function  tf(u>)  such  that  the  two  following  conditions 


hold: 

(i) 


n 


[ #(<i))do) 


(ii)  Let  x(t)  he  passed  through  a linear,  time-invariant  system  with 
transfer  function  H (uj)  . Denoting  by  x'(t)  the  output  of  this 
system,  the  corresponding  energetic  quantity  associated  with  x'(t) 
say  n'  , is  given  by 


II' 


1 

2tt 


|H(u>)  | 2 #(u))du> 


If  conditions  (i)  and  (ii)  are  fulfilled,  then  #(oo)  is  called  the 
spectrum  of  II  . 

As  an  example,  for  a wide-sense  stationary  stochastic  process  x(t) 
we  can  define  its  average  power  as 

n = e{ |x(t) |2} 

If  we  define 

#(<*>)  « •*r[E{x(t+x)x*  (t) }] 

( ,f  denotes  Fourier  transform)  it  is  known  that  (i)  and  (ii)  hold, 
so  tliat  tf(u>)  can  be  called  the  average  power  spectrum  of  x(t)  . 
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W«  want  to  consider  non- stationary  random  processes;  the  appropriate 
subclass  of  random  process  for  which  wo  can  define  the  spectnm  of  a useful 
energetic  quantity  is  that  of  harmonizable  processes  (Loeve,  1963).  Roughly 
speaking,  a process  is  harmonizable  if  we  can  define  its  Fourier  transform; 
for  a precise  definition,  see  (Loeve, 1963)  or  (Cambanis  and  Liu, 1970). 

These  processes  are  a first-step  generalization  of  wide-sense  station- 
ary stochastic  processes.  It  lu»s  been  proved  (Cambanis  and  Liu, 1970)  that , 
under  some  very  mild  conditions,  a random  process  obtained  as  an  output  of 
a linear  system  is  harmonizable.  The  system  may  Ik'  randomly  time-variant; 
the  input  process  need  not  be  stationary  or  even  harmonizable. 

For  a harmonizable  random  process,  it  makes  sense  to  look  for  tin' 
spectrum  of  the  following  energetic  quantity: 

n - mih |x(t) | 2 > (A.n 

where  M denotes  time  average.  The  spectrum  of  II  can  be  obtained  as 
follows:  define  first  the  (generalized)  function 

. «00  fOO 

rCMi.Ua)-  J J Hix(t+T)x*(t)}  e"-1  f dt  di  (A. 2) 

jmd  try  to  express  it  in  this  form: 

r(un,u>i)  ■ 2n  D9  (u)i)6(u>) -ui)  +P  (uu,Wj)  (A. 3) 

where  rA(u>l,u)i)  does  not  include  juiy  line  masses  on  ti:e  bisector  of  the 
plane  (u>i,m,)  . 

Then  it  can  be  shown  (Rlanc -Lapierre  iu\d  Fortet,1968)  that  tf((o) 
is  the  spectrum  of  II  as  defined  in  (A.l)  (in  other  words,  (i)  and  (ii) 
hold  for  vf  (u>)  ) . 
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It  is  often  useful  to  express  r((i>i,ui2)  in  the  following  form: 
r(to,fw2)  - E{X(a)l)X*(uj2)}  (A. 4)' 

where  X(u>)  is  the  Fourier  transform  of  x(t)  . 

Example  1 (stationary  processes) 

Let  x(t)  be  a harmonizable,  wide -sense  stationary  process;  define 
its  autocorrelation  function 

R(t)  = E{x(t+x)x*(t) } . 


Then 


r(wx,u>2)  = j | R(t)  e'-i[(a)l'u,2)t+a,lT]  dt  dx 


Integrating  first  in  t , we  get 


I>,,u>2)  = 


R(x)  e ^WlT  dx  • 2tt6(o)i -oj2) 


This  means  that 


#(w) 


R(x)  e'JU,T  dx 


is  the  average  power  spectrum,  and  rcso  , i.e.,  the  function  r(oi1  ,a>2) 
for  wide-sense  stationary  random  processes  reduces  to  a line  mass  on  the 
bisector  of  the  plane  (wi,o)2)  . 


Example  2 (cyclostationary  random  processes) 


Let  x(t)  be  a harmonizable,  wide-sense  cyclostationary  process 
(see,  for  example,  Gardner  and  Franks, 1975) . We  have 

E x(t+x)x*(t)  * E x(t+x+T)x*(t+T)  (A. 5) 

Equation  (A. 5)  can  be  interpreted  by  saying  that  this  average,  when 
considered  as  a function  of  t , is  periodic  with  period  T . 

We  can  represent  it  as  a Fourier  series,  i.e.: 


E{x(t+T)x*(t) } - l c (t)  e 

n--oo 


jnflt 


o.  2* 

, * -?jr 


(A.6a) 


where 


Cn^  -f 


E{(t+x)x*(t)}  e’jn0t  dt 


(A.  6b) 


Using  (A. 6),  we  get 


r(Ml,w2)  - l j j c (t)  ejnfit  e-n(^2)t^T] 


dt  dt 


n=  -oo 


2*  l C (a>1)6(u)1-w2-nft) 

n»-oo  n 


(A.  7) 


w*'ere  ^n^u^  » -°°<n<®  , are  the  Fourier  transforms  of  c (x)  . The 
filnction  r(wi  ,<*>*)  is  thus  made  by  linear  masses  located  on  lines  parallel 
to  the  bisector  (u>i,u>2)  . Thus,  using  (A. 3) 

i fT  f® 

»(“>)  - Co(w)  - J j E(x (t+T )x* (t) } e'JU)T  dT  dt  (A. 8) 

0 -<*> 

and  this  formula,  derived  differently,  has  been  often  used  for  spectral  anal- 
ysis of  cyclostationary  processes  (see,  for  example,  Robinson  et  al_.  ,1973). 


APPENDIX  B:  CCMPUTATION  OF  THE  MATRIX  SERIES  A(w) 


The  problem  we  want  to  solve  in  this  Appendix  is  the  following: 
compute  the  matrix  series 

A(oj)  ~ 1 An  e ^n0  (B.l) 

n=0 

where 

A = P - P°°  , (B.  2) 

P being  the  M XM  transition  probability  matrix  of  a homogeneous,  regular 
Markov  chain,  and 


9 = coT 


(B.3) 


It  is  known  that  a necessary  and  sufficient  condition  for  the 
equality 


l Rn  - (I  - R)'1  (B.4) 

n=0 

to  hold  is  that  all  the  eigenvalues  of  R have  magnitude  less  than  1 . 

It  is  easy  to  prove  (see  Cariolaro  and  Tronca,  1974)  that  this  is  the  case 
for  the  matrix  A e -,0  , so  we  can  write 

A(w)  = (I  - A e'^9)'1  (B.5) 

Thus,  the  matrix  series  E(oj ) can  be  computed,  for  each  value  of  6 , by 
inverting  a matrix.  This  procedure  is  computationally  inefficient  because, 
if  we  want  to  compute  the  power  spectrum  for  several  values  of  w , we 
need  as  many  matrix  inversions  as  values  of  u>  . 
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A(w)  in  this  form: 

(B.6) 

This  is  possible  since,  under  our  hypothesis,  A(u>)  turns  out  to  be  an 
analytic  function  of  the  matrix  A . Hence,  it  can  be  written  as  a poly- 
nomial in  A . The  coefficients  will  generally  depend  on  0 , and  the 
minimum  value  of  L in  (B.6)  equals  the  degree  of  the  minimum  polynomial 
of  A . 

To  find  the  coefficients  0.(0)  , write  the  minimum  polynomial  of 

A as 


For  a more  efficient  technique,  write 


L-i  . 

A(uD  = l 8-  (O)A1 

■'*  : _ n A ~ 


i=0 


L 

d(X)  = l a.  A3  , a=l 

j=0  3 L 


(B.  7) 


Then  observe  that,  for  the  definition  of  the  minimim  polynomial, 


I «,  A1  = 0 


i*0 


l ~ 


(B.8) 


(0  is  the  null  matrix).  Equate  now  the  right-hand  sides  of  (B.5)  and  (B.6) 


L-l 


I - (I  - A e 3 j l 8i(9)A1 
i=0 


(B.9) 


From  (B.9),  taking  (B.8)  into  account,  we  get,  after  some  algebra: 


L-h 


l « 


ve) 


£-1 


£+h 


J£6 


d(e30) 


0 shs  L-l 


(B.10) 
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liq. (B.6)  can  also  be  rearranged  according  to  the  powers  of  cJ  , giving 


t-l\h-0  n / 

d(eJ0) 


(B. 11) 


Notice  that  the  matrices  appearing  in  (B.ll)  as  coefficients  of  e-  , 

1 s l s L , can  be  evaluated  once  and  for  all  at  the  beginning  of  the 
computation. 

It  can  be  observed  that  the  hypothesis  that  d(\)  is  the  minimun 
polynomial  of  A has  never  been  used.  Actually,  every  d(x)  such  that 
(B. 8)  holds  can  be  used  instead  of  the  minimum  polynomial . Of  course,  the 
minimum  polynomial  will  give  the  most  compact  form  for  A(u>)  ami,  hence, 
for  the  spec t run. 

As  an  example,  the  use  of  the  characteristic  polynomial  of  A leads 
to  a simple  computational  algorithm,  due  to  Faddeev  and  first  applied  to 
this  problem  by  Cariolaro  and  Tronca  (1974).  According  to  this  technique, 
A(u>)  can  be  evaluated  as  follows: 


B(ej°) 

A(e-ie) 


(B. 12) 


where  B(«)  is  an  MxM  matrix  polynomial: 


B(X)  - I e^10*  Bj  e.' (M-l)0+, , ^ ej 


CH. 13) 


and  A(*)  is  the  characteristic  polynomial  of  A: 


jo.  IMG.  . J(M-l)eA  . c 
A(c*  ) - eJ  ♦ e'v 


(B. 141 
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The  polynomials  B(*)  and  A(»)  can  be  confuted  simultaneously  using  the 
following  algorithm  (Gantmacher ,1960) ; starting  with 

B#  - I 
let 

~k  " ~ ^k-1 

6k  • ' r tr 

?k  = £k  + 6k  ^ 

for  k»l,2,...,M  . At  the  final  step,  we  must  have  = 0 , the  null  matrix. 

With  this  algorithm,  the  coefficients  of  the  polynomials  whose  ratio 
gives  A(u>)  can  he  computed  only  once,  giving  an  expression  for  the  spectnm 
which  can  be  computed  for  several  values  of  u>  with  a limited  computational 
effort. 
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APPENDIX  C:  A SERIES  EXPANSION  FOR  HARMON I ZABLE  RANDOM  PROCESSES 


In  this  Appendix,  a heuristic  derivation  is  presented  of  a result, 
due  to  Masry  et  al. (1968)  and  to  Campbell (1968) , in  series  representations 
of  wide-sense  stationary  stochastic  processes.  Similar  results  have  been 
obtained  for  harmonizable  processes  by  Cambanis  and  Liu(1970),  and  for  the 
more  general  class  of  weakly  continuous  processes  by  Cambanis  and  Masry (1971). 

Let  x(t)  be  a wide-sense  stationary  process;  we  want  to  write  it  as 


xoo  - l sn  sn(t)  -~<t<~ 

n 

Define  first  the  Fourier  transform  of  the  process: 

.GO 

X(w)  = | x(t)e  ’,a)t  dt 

-oo 

so  that  x(t)  can  be  represented  as 

xoo  = h J x(^eja>t  da) 


(C.l) 


(C.  2) 


(C.3) 


and  (see  Appendix  A) 


E(X(u)1  )X* (oj2)  ) = 2tt  «(aj1)6(a)1-a)2) 


(C.4) 


where  #(•)  is  the  average  power  spectrum  of  x(t)  . 

Consider  now  the  Hilbert  space  »)  of  the  functions  f (w)  with  norm 


||f|f2  - ^ J" |f(w)|2  «(u))dw 

-OO 


(C.5) 


and  scalar  product 


(f,g) 
Let  tyn) 


“00 

be  a complete,  orthonormal  set  in 
■ 57  j le^l2  »(u>)du>  < » 

-oo 


•*"(»)  . 


we  get 

e-^  c .#"(  V) 

so  that  we  can  expand  it  in  the  form 

e^Wt  = l sn(t^n^ 

n 


where 


sn(t)  = (eju,t,*n)  = ^ | e-iu,t  i|£(u>)  »(u>)du,  . 

-00 

Miltiplying  both  sides  of  (C.9)  by  X(w)  ami  integrating, 
using  (C.3): 

x(t)  - l ^ s (t) 
n 

where 

^n  “ h j x(“)*n(“)dw  • 


(C.6) 

Since 

(C.7) 

(C.8) 

(C.9) 

(C.10) 

we  get, 

(C.ll) 


(C.12) 
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The  random  variables  £ are  uncorrelated;  in  fact 


err*  1 

Sn^n  (2ir)2  , 


F.{X(wi)X*(u)2)}  ii)m(o)i)^n*(u>2)du»j  dw2 


1 

2tt 


(.to (oo ! -w2)  C**»x )4>n  (w2)du>j  du2 


fib  ,d/  ) = 6 

n mn 


Example 
Suppose  that 


#(«*>)  =< 


|u|  < ft 

elsewhere 


The  system 


pn(w)  " J VG0 


jn(n/jl)o) 


|u)  | < n 


is  orthonormal  and  complete  in  ) . 

We  have 


1 


ri2 


s ft)  * 

n 2^ixT 


j'.'(t-n(i»/n))  G 

o 


-i; 


I GQii  axn(rtt-nn) 

V it  " Qt-nn 


(C. 13) 


and  consequently 


mm ■ 
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u(t ) - ? r £Ll^-nn) 

o.,  n ftt-nn 


where 


E f*nS» 


c ft 

o 


nni 
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APPENDIX  D:  INEQUALITIES  BASED  ON  MOMENTS 


D.  1 Statement  and  Solution  of  the  Problem 

Let  X be  a continuous  random  variable  whose  range  [a,b]  and 
whose  first  N moments 

Ui  = EtX1  } i*l,2 N 

are  known.  In  this  Appendix,  we  shall  consider  the  two  following  problems: 
(i)  Find  sharp  upper  and  lower  bounds  to  the  probability 

Pr{X  s X) 

where  X e (a,b)  is  a known  quantity. 

(ii)  Find  sharp  upper  and  lower  bounds  to  the  average 

E{D(X) } 

where  fl(*)  is  a known  function. 

By  "sharp"  I mean  that  the  bounds  cannot  be  further  improved;  i.e.,  that 
random  variables  do  exist  that  meet  the  bounds  while  having  range  [a,b] 
and  moments  • No  attempt  will  be  made  to  get  closed  forms  for 

the  results.  Instead,  numerical  techniques  will  be  sought  that  are  general 
enough  to  handle  a large  class  of  situations  and  computationally  suitable 
from  the  viewpoint  of  speed  and  accuracy. 

Consider  the  set  . . ,uN)  of  the  distribution  functions  F(*) 

having  range  in  [a,b]  and  first  N moments 


a 


(D.l) 
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>tw  ■*--  • 


In  general,  there  are  many  distribution  functions  having  such  a set 
of  moments;  actually,  they  form  a closed,  convex  set  in  the  space  of  all 
distribution  functions.  Some  points  in  this  space,  however,  play  a key 
role  in  the  solution  of  our  problems. 

Consider  the  subset  of  piecewise  constant  distribution  functions 
having  a finite  nunber  v of  points  of  increase,  say  (x^ ,X£ , . . . ,xv)  , 
and  saltuses  (w^.w^ *v)  • Por  these  functions, 


F(x) 


w. 


:x-  s x 


(H.2) 


Thus,  for  any  distribution  function  F e JF  n .<r(pj , . . . ,vi^)  , we  get 


v i 
u-  = ) w.x. 

1 j-1  J J 


Define  now  the  function 


(2 


«(*i)  = \ 


LI 


a<  x.  <b 
1 


x.  * a or  x.  * b 

l l 


(D.3) 


(H.4) 


Then  the  index  of  the  distribution  function  F e is  defined  as 

\> 

IF  - l « (x.)  . (D.S) 

i*l 

According  to  the  index  value,  a distribution  function  F belonging  to 
& n .#(Uj,...,mn)  will  be  called 

canonical  , if  IpSN+2 

principal  , if  Ip»N+l 
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Two  results  due  to  Krein  (1951)  are  the  key  to  solving-both  problems 
(i)  and  (ii)  (see  also  Karlin  and  Studden  (1966),  and  the  references  there- 
in): 


Proposition  1 

For  any  N , there  are  only  two  principal  distribution  functions. 
If  N is  odd,  say  N*2n-1  , their  points  of  increase  satisfy 


a <x,  <x.<. . . <x  <b 
12  n 

and 

a * x,<x,<. . .<x  <x  . * b 
12  n n+1 

respectively. 

If  N is  even,  say  N=2n  , their  points  of  increase  satisfy 


a*x, <x0<  c..<x  Al<b 
1 2 n+1 


and 


a <x, <x_<  . . .<x  <x  , = b 
12  n n+1 

respectively. 


Proposition  2 

If  one  of  the  x^  is  given  in  advance,  xi  e (a,b)  , then  there  exists 
a unique  canonical  distribution  function  including  this  point. 
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Consider  now  the  solution  to  problems  (i)  and  (ii); 


Theorem  1 (Krein  1951) 

If  A is  a point  of  continuity  of  the  distribution  of  X , then 


£ w.  s Pr{Xs  A)  s £ w. 
< A 1 x.  sA  1 


(D.b) 


where  x^v^  are  the  points  of  increase  and  the  saltuses  of  the  canonical 
distribution  in  . . . ,n^)  including  the  point  A . 


Theorem  2 (Krein  1951) 

If  il(t)  has  a continuous  (N+l)-th  derivative,  and  s^N+1^(t)  is  every 
where  in  [a,b]  either  concave  or  convex,  then 

b h 

fl(x)dF'  (x)  s F.U1(X))  s Sl(x)dF"(x)  (I). 7) 

a Ja 

where  F * ( • ) and  F"(»)  are  the  principal  distribution  functions  in  . . ,p^). 

It  is  most  infiortant  to  observe  that  F(\)  and  P'(«)  which  give  the 
smallest  and  greates  values  of  E{fi(X)}  do  not  depend  at  all  on  the  choice 
of  the  function  fl(t)  » provided  that  the  convexity  requirement  requested 
in  the  statement  of  the  theorem  is  met. 

Actually,  a solution  to  the  problem  of  finding  sharp  upper  and  lower 
bounds  to  E{n(X)}  when  is  neither  convex  nor  concave  can  also 

be  found,  but  it  involves  such  a large  amount  of  computations  that  it  is 
unsuitable  for  applications. 
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D . 2 Computational  Algorithms 

Once  the  solution  to  l>oth  problems  (i)  and  ( i i ) has  been  found,  the 
problem  that  arises  is  to  devise  simple  algorithms  to  obtain  the  princi- 
pal and  canonical  distribution  functions  in  . . .p^).  So  far,  only 

problem  (ii)  has  yielded  a solution,  which  I shall  present  in  this  section. 
The  rationale  behind  it  should  prove  useful  also  to  the  solution  of  prob- 
lem (i),  although  some  modifications  are  needed. 

Essentially,  deriving  principal  and  canonical  distribution  functions  is 
equivalent  to  finding  three  sets  of  numbers  tVi-1  , (Vi-1  * (Vi-1  ’ 

asxjsb,  w^O  , , such  that,  given  the  set  of  moments  Uj,...,^  » 

the  following  holds: 


Ui 


v 

l 

i«l 


Vi- 


p 


w 


(0.8) 


Specifically: 

for  the  principal  distribution  functions,  N odd: 


N+l 


p-0 


and 


v - ^y1-  , p-2  (yt-a  » y2"b)  ; 

- for  the  principal  distribution  functions,  N even: 

v - y , p-1  (yj-a) 

and 

v - N , p-1  (yj-b) 


(0. 10) 


(D.ll ) 


(IU2) 


M 
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- for  the  canonical  distribution  function  including  point  \ : 
v s • p21 


(D.13) 


This  problem,  as  we  shall  soon  see,  is  equivalent  to  the  problem  of 
finding  approximate  integration  formulas  with  a given  degree  of  exactness. 
In  fact,  consider  the  approximation 

,b  fb 

f (x)dF(x)  = ^ f(x)dFv>p  (D.14) 

who-e  F is  a distribution  function  with  v+p  points  of  increase,  p of 

v,P 

which  are  fixed.  Define  the  remainder 


fb 

f (x)dF(x)  - 

•a 


f(x)dFv>p(x)  . 


(D.15) 


We  say  that  (D.14)  has  degree  of  exactness  s if  the  remainder  is 
zero  for  f(x)  = xZ  , <,=0,1,..., s , and  is  nonzero  for  f(x)  = x**1  . 
In  other  words,  (D.14)  has  degree  of  exactness  s if  F (•)  is  a dis- 

V 

Crete  distribution  function  whose  first  s moments  take  the  same  value 
as  the  corresponding  moments  of  F(*)  . 

Thus,  the  problem  of  finding  the  principal  distribution  function 
is  equivalent  to  the  problem  of  finding  approximations  of  the  form  (D.14) 
to  the  original  distribution  function  F(«)  . 


The  degree  of  exactness  of  the  approximation  must  of  course  be  equal  to 
N,  the  nuirtber  of  known  moments  of  the  distribution  F(»)  . We  have  the 
following  result  (see  Krylov  (1962),  p.161): 
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Theorem  3 

Let  (x,,...,x  ,,y. y 1 be  the  set  of  points  of  increase  of  F (•)  , 

1 v 1 p v tr 

and  l>’j y } be  the  set  of  fixed  points.  Tine  maximun  degree  of 

exactness  of  (D.14)  is 

2v  ♦ p-1 

and  this  is  achieved  only  if 

E{pv(X)p(X)X*')  - 0 1-0,1 v-1  (D.lb) 

where 

Pv(x)  - (x-Xj) . . . (x-x^)  CD* 17) 

and 

p(x)  - (x-yj)...(x-yp)  . (D* 18) 

As  we  can  see  from  (D.9)-(D.12) , the  problem  of  finding  principal 
distribution  functions  is  equivalent  to  the  problem  of  finding  approxi- 
mations of  the  form  (D.14)  with  the  maximum  degree  of  exactness. 

Consider  first  the  situation  p-0  (no  preassigned  point  of  increase 
for  the  principal  distribution  function).  Then  the  maximum  degree  of 
exactness  of  (D.14)  is  2\>-l  , and  this  is  achieved  only  if 

E{p  (X)X£>  - 0 1-0,1,..., v-1  . (D. 19) 

The  numerical  algorithm  for  finding  the  points  Xj,...,xv  in  this  case 
has  been  developed  by  Golub  and  Wclsch  (1969). 


tr:_ 


] 
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Condition  (D.19)  is  equivalent  to  saying  that  Py(*)  is  the  v-th 
term  in  a sequence  of  orthogonal  polynomials 


CPiC»i.o 


deg  p^*)  - i 


(D.20) 


such  that 


E(Pn(X)PmP<))  • o„ 


2 

a > 0 
n 


CD. 21) 


Its  zeros  are  then  the  points  of  increase  of  the  distribution  F . 

These  polynomials  satisfy  a three-term  recurrence  relationship  of 
the  form 


j *0 ,l,...,v 


(D.22) 


Pj 00  “ (BjX  ♦ bjiPj^Cx)  - cjPj.2(x) 

with  initial  values 

P.j(x)  5 o 


P0  (X)  • 1 


and  a^>0  , c^>0  . 

To  construct  these  polynomials,  rewrite  system  (D.22)  as  follows: 

xp(x)  - T p (x)  ♦ J-  Pv(x)ey  (D.23) 

where  g(x)  is  the  column  vector  of  polynomials  Pq(x) , . . . ,p^  j(x)  , 

T is  a tridiagonal  matrix  with 


(D.24) 


(Pi!  ■ ■ Vai 

i*l,. . . ,v 

’ 1/ai 

i*l,. . . ,v-l 

tPi.i.i  ■ S/ai 

i*l v-1 

a 
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and 


(D. 25) 


Thus,  we  can  see  that  Pv(x)  has  a zero  at  x=x^  if  and  only  if 


xi  P^V  = I P(xi^ 


(D.26) 


i.e.,  x^  is  an  eigenvalue  of  the  tridiagonal  matrix  T . 

Hence,  the  problem  of  computing  the  set  of  points  of  increase  of  Fy 
is  reduced  to  the  problem  of  evaluating  the  eigenvalues  of  a tridiagonal 
matrix. 

It  can  also  be  shown  that,  if  p(x^)  is  the  eigenvector  of  T cor- 
responding to  the  eigenvalue  x^  , and  in  addition  an=l  in  (D.21),  then 
the  choice 


w.  = 
1 


1 

pT(xi)p(xi) 


i=l, . . . ,v 


(D. 27) 


will  give  the  saltuses  of  Fv  Q that  achieve  the  degree  of  exactness 
2v-l  . In  conclusion,  the  solution  of  an  eigenvalue  + eigenvector  problem 
will  completely  specify  the  principal  distribution  function  Fy  ^ . 

The  problem  now  is  to  construct  the  polynomials  (p^(x)}^  that 
satisfy  the  orthogonality  relationship  (D.20).  This  can  be  done  starting 
from  the  knowledge  of  the  moments 


= E{Xj} 


2v 
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and  using  a procedure  due  to  Mysovskih  (1%8). 


Let  oQ(x) on(x)  n+1 

linearly  independent 

functions,  and 

let  M » (m^)  he  the  (n+l)*(n+l) 

matrix  with  elements 

m^j  - Eioj (X)Oj (X) } 

i , j*0, 1 n 

(0.28) 

Clearly,  M is  positive  definite. 

Let 

M = R'R 

(0.29) 

be  the  Cholesky  decomposition  of  M 

, where  R is  upper- 

triangular  and 

R'  lower-triangular. 

If 

C ^ n - 1 

S * R 

• 5-<»iP 

(0.30) 

(S  an  upper  triangular  matrix),  then  it  can  easily  be  proved  that 


Mx)  * i 0,00  .1*0,1 n (0.31) 

J k=0  KJ  K 

form  an  orthonormal  system,  i.e., 

E4-CXH.CX)}  - 6..  . (H. 32) 

In  particular,  if 

o^x)  * x*  (0.33) 

then  4> j (x)  are  polynomials,  exactly  those  we  are  looking  for.  In  this 
case,  the  matrix  M is  the  Hankel  matrix  with  elements 


,n 


E{Xi+j) 


(0.34) 
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To  build  the  polynomial  Pv(x)  , we  need  thus  the  moments  Pq=1  , p^ P2n 

Actually,  only  2n-l  moments  are  needed  to  construct  the  set  ortho- 
gonal polynomials  that  we  are  looking  for.  In  fact,  the  role  of  P£n  is 
just  that  of  normalizing  the  system  of  orthogonal  polynomials,  and  its 
value  affects  neither  the  points  of  increase  nor  the  saltuses  of  Q(* ) 
(Gautschi  1970,  p.256). 

Consider  now  the  case  p=l  . Suppose  first  that 

yx  = a (D.31) 

so  that 

p(x)  = x-a  asx^b  . (D.36) 

To  construct  the  approximation  (D.14)  corresponding  to  this  case  and  having 
degree  of  exactness  2v  , we  must  find  a polynomial  Pv(x)  such  that 

E(p  (XjCX-ajX*}  = 0 £=0,1,..., v-1  . (D.37) 

This  problem  will  be  solved  by  reducing  it  to  the  same  problem  solved 
previously.  In  fact,  condition  (D.37)  can  be  rewritten  as 

•b  « _ 

p (x)x*  dF(x)  = 0 £=0,1, . . . ,v-l  CD. 38) 

■*a  v 

where  F(*)  is  a new  distribution  function  such  that 

dF(x)  = dF(x)  . (D.39) 

Pi  a 

The  moments  of  F(x)  are  thus  given  by 





— L.  ' ' 
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.£  x-a 


Va 


dFCx) 


Vl  • au£ 


Uj-a 


£*0,1,, 


(D.40) 


Using  this  set  of  modified  moments,  we  are  thus  able  to  compute  the 
polynomial  Pv(x)  , and  consequently  its  zeros.  These  zeros  give  the 
locations  of  the  points  of  increase  of  the  distribution  Fy  j requested. 

To  get  the  saltuses  of  these  points  that  allow  (D.14)  to  attain  its 
maximum  degree  of  exactness,  we  can  use  (Krylov  1962,  p.164): 


w.  = 


Vl 


-a 


>0^) 


w. 

l 


i=l v 


CD. 41) 


where  w^  are  the  saltuses  obtained  by  using  (D.27).  To  compute  the 
saltus  Zj  at  y^=a  , we  need  only  observe  that  the  sum  of  the  saltuses 


must  be  equal  to  ppSl  . Thus,  from  (D.8), 


v 


‘ 1 ’ i=l  "i 


(D.42) 


Similar  results  hold  for  y^=b  ; in  this  case  p(x)=  b-x  and  the  procedure 


is  the  same.  In  particular,  the  modified  set  of  moments  is  now 


ut - 


bV  Vi 


b-y. 


£=0,1, 


(D. 43) 


Consider  finally  the  case  p=2  . Now 


and 


yj  = a , y2  - b 
p(x)  = (x-a) (b-x) 


(D.44) 


(ti.4S) 
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To  construct  the  principal  distribution  function,  we  must  find  a polynomial 
Pv  (x)  such  that 

E{pv(X)(X-a)(b-X)X*'}  = 0 £*0,1,...  (D.46) 

Defining  the  new  distribution  function 


F(x) 


_Jx_aj(b:x) 

-U2+  (a+b)u1-ab  ’ 


CD*  47) 


then  condition  (D.47)  is  equivalent  to 


(D. 48) 


The  polynomial  pv(x),  hence  its  roots,  can  be  obtained  using  the 
technique  previously  outlined.  To  get  the  corresponding  saltuses,  we 
can  use 


w. 

l 


-V^+Ca+bJUj-ab 

P(xi) 


w. 

l 


(D.49) 


To  corqxite  and  z2  , we  can  observe  that  in  particular,  since  the 
degree  of  exactness  of  (D.14)  must  be  at  least  1 , 


(D.  50) 


We  have  two  equations  in  two  unknowns  that  can  be  solved  to  give 
and  z2  . 


Example 

Let  a--l,  b*l,  and 


-P.14- 


Uj  = 0 , u,  » o4  , Uj  “ 0 

Hie  principal  distribution  functions  with  this  set  of  moments  have  points 
of  increase 

-1  < Xj  v x2  . 1 

and 

-1  S Xj  <x2sx3  - 1 , 

respectively. 

Compute  first  the  principal  distribution  function  with  points  of 
increase  internal  to  the  interval  (.-1,1)  . We  get  the  orthogonal  polynomial 

2 2 

p,(x)  * X - O 

L 

which  has  roots 

Xj  * -O,  X,  = 0 

The  corresponding  saltuses  are  given  by 
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In  this  case,  we  get  only  one  moment: 


-v  *1 


= o . 


The  corresponding  orthogonal  polynomial  has  degree  1 , and  unique 
root  0 . The  saltus  is  then  w^=l  and,  using  (D.49): 


Wj  = l-o 


The  saltuses  corresponding  to  points  ±1  can  be  obtained  solving  (D.50) 


1 = zl+  z2+  ^1'cr  5 


0 = -Zj+  Z2 


which  gives 


z^  — ■ Z2  — o /2  . 


One  final  comment  is  appropriate  about  the  restriction  involved  in 
Theorem  2 with  respect  to  the  (N+l)-th  derivative  of  0(t)  . Should  this 
condition  not  hold,  one  can  resort  to  bounds  that  are  not  sharp,  but  can 
be  easily  computed. 

In  fact,  if  N is  an  odd  number,  the  principal  distribution  function, 
all  of  whose  points  of  increase  are  internal  to  [a,b]  , gives  the  approxi- 


mation 


E{ft(X)>  » I wi  ^*0 
i=l 


N+l 

V ~T~ 
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valid  for  all  ^(x^) , where  the  remainder  is  bounded  by 

|R  0(n)|  £ — 2 — mox|fi^2v0^  (5)| 

v-°  (2v  )!  cj  t 

0 

2 

and  a is  defined  in  (0.21)  (for  details,  see  Benedetto  and  Biglieri, 
v0 

1975,  and  references  therein). 
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